The PISA 2009 dataset coded maternal education as paternal education, and vice versa. PISA provides composite educational attainment variables for both parents, as well as several other variables (microvariables) which specify whether they finished x categor. It turns out that a composite from the microvariables correlates with the PISA-screated composite variable from the other parent at 1, and the two microvariables correlate with each other at .604.
This is a great find. Do you have any recommendations for other researchers looking to validate demographic variables in large-scale datasets? Is there a better way to double-check for issues like this?
I typically use priors. For example, if the correlation between parental SES and IQ is below .3 or above .6, then something probably went wrong. In this case, I noticed that the correlation between maternal education and fertility was higher than the one for paternal education, and that made it seem off.
this is powerful autism
will include it in my next academic scandals roundup
also posted this on EJMR
This is a great find. Do you have any recommendations for other researchers looking to validate demographic variables in large-scale datasets? Is there a better way to double-check for issues like this?
I typically use priors. For example, if the correlation between parental SES and IQ is below .3 or above .6, then something probably went wrong. In this case, I noticed that the correlation between maternal education and fertility was higher than the one for paternal education, and that made it seem off.
Thank you, that makes sense.