The PISA 2009 dataset coded maternal education as paternal education, and vice versa. PISA provides composite educational attainment variables for both parents, as well as several other variables (microvariables) which specify whether they finished x categor. It turns out that a composite from the microvariables correlates with the PISA-screated composite variable from the other parent at 1, and the two microvariables correlate with each other at .604.
In this case, it is likely that it is the composite variable that is wrong because the correlation between maternal education and fertility is higher than the paternal one, which does not replicate prior research.
maud$medu <- NA
maud$pedu <- NA
maud$medu[maud$PQFISCED=='None'] <- 0
maud$pedu[maud$PQMISCED=='None'] <- 0
maud$medu[maud$PQFISCED=='ISCED 3A'] <- 1
maud$pedu[maud$PQMISCED=='ISCED 3A'] <- 1
maud$medu[maud$PQFISCED=='ISCED 4'] <- 2
maud$pedu[maud$PQMISCED=='ISCED 4'] <- 2
maud$medu[maud$PQFISCED=='ISCED 5B'] <- 3
maud$pedu[maud$PQMISCED=='ISCED 5B'] <- 3
maud$medu[maud$PQFISCED=='ISCED 5A or 6'] <- 4
maud$pedu[maud$PQMISCED=='ISCED 5A or 6'] <- 4
maud$medu2 <- NA
maud$pedu2 <- NA
maud$medu2[maud$PA10Q04=='No'] <- 0
maud$pedu2[maud$PA09Q04=='No'] <- 0
maud$medu2[maud$PA10Q04=='Yes'] <- 1
maud$pedu2[maud$PA09Q04=='Yes'] <- 1
maud$medu2[maud$PA10Q03=='Yes'] <- 2
maud$pedu2[maud$PA09Q03=='Yes'] <- 2
maud$medu2[maud$PA10Q02=='Yes'] <- 3
maud$pedu2[maud$PA09Q02=='Yes'] <- 3
maud$medu2[maud$PA10Q01=='Yes'] <- 4
maud$pedu2[maud$PA09Q01=='Yes'] <- 4
cor.test(maud$pedu, maud$pedu2)
cor.test(maud$medu, maud$medu2)
cor.test(maud$medu, maud$pedu2)
cor.test(maud$pedu, maud$medu2)
Appendix.
composite educational attainment variables:
specific degree variables:
this is powerful autism
will include it in my next academic scandals roundup
also posted this on EJMR
This is a great find. Do you have any recommendations for other researchers looking to validate demographic variables in large-scale datasets? Is there a better way to double-check for issues like this?