PISA 2009 Parental Education Variable is Miscoded

Nov 03, 2023

The PISA 2009 dataset coded maternal education as paternal education, and vice versa. PISA provides composite educational attainment variables for both parents, as well as several other variables (microvariables) which specify whether they finished x categor. It turns out that a composite from the microvariables correlates with the PISA-created composite variable from the other parent at 1, and the two microvariables correlate with each other at .604.

In this case, it is likely that it is the composite variable that is wrong because the correlation between maternal education and fertility is higher than the paternal one, which does not replicate prior research.

maud$medu <- NA
maud$pedu <- NA

maud$medu[maud$PQFISCED=='None'] <- 0
maud$pedu[maud$PQMISCED=='None'] <- 0
maud$medu[maud$PQFISCED=='ISCED 3A'] <- 1
maud$pedu[maud$PQMISCED=='ISCED 3A'] <- 1
maud$medu[maud$PQFISCED=='ISCED 4'] <- 2
maud$pedu[maud$PQMISCED=='ISCED 4'] <- 2
maud$medu[maud$PQFISCED=='ISCED 5B'] <- 3
maud$pedu[maud$PQMISCED=='ISCED 5B'] <- 3
maud$medu[maud$PQFISCED=='ISCED 5A or 6'] <- 4
maud$pedu[maud$PQMISCED=='ISCED 5A or 6'] <- 4

maud$medu2 <- NA
maud$pedu2 <- NA

maud$medu2[maud$PA10Q04=='No'] <- 0
maud$pedu2[maud$PA09Q04=='No'] <- 0
maud$medu2[maud$PA10Q04=='Yes'] <- 1
maud$pedu2[maud$PA09Q04=='Yes'] <- 1
maud$medu2[maud$PA10Q03=='Yes'] <- 2
maud$pedu2[maud$PA09Q03=='Yes'] <- 2
maud$medu2[maud$PA10Q02=='Yes'] <- 3
maud$pedu2[maud$PA09Q02=='Yes'] <- 3
maud$medu2[maud$PA10Q01=='Yes'] <- 4
maud$pedu2[maud$PA09Q01=='Yes'] <- 4

cor.test(maud$pedu, maud$pedu2)
cor.test(maud$medu, maud$medu2)
cor.test(maud$medu, maud$pedu2)
cor.test(maud$pedu, maud$medu2)

Appendix.

composite educational attainment variables:

specific degree variables:

Christopher Brunet

Nov 3, 2023Edited

this is powerful autism

will include it in my next academic scandals roundup

also posted this on EJMR

Expand full comment

Humbert Rivière

Nov 3, 2023

This is a great find. Do you have any recommendations for other researchers looking to validate demographic variables in large-scale datasets? Is there a better way to double-check for issues like this?

2 replies by Sebastian Jensen and others

2 more comments...

sebjenseb

Discussion about this post