Regression to the mean happens once

Sebastian Jensen

Jun 19, 2024

The correct answer is 112.

This is because regression to the mean happens once.

Consider the conventional formula for estimating the average IQs of children that two parents with an IQ of 120:

EIQ = ((120+120)/2 - 100) x 0.6 + 100

EIQ = 112

Intuitively, the average IQ of a child who breeds with another child who also has an IQ of 112 and two parents with an IQ of 120 should be 107.2

EIQ = ((112+112)/2 - 100) x 0.6 + 100

EIQ = 107.2

It doesn’t work that way, because a grandchild with grandparents that all have IQs of 120 are uniquely elevated in terms of genetic predisposition to intelligence.

Allow me to simulate a distribution of IQs of 4 people, where their additive genes account for 60% of the variance in their intelligence, and the nonadditive genes/environment account for the rest of it.

#setting the seed
set.seed(123)
#additive h^2 of 60%, environment/nonadditive effect of 40%
h2 <- 0.6
ce2 <- 0.4
#calculating the standardized coefficients of each component, as variance components are squared by definition.
h2eff <- sqrt(h2)
ce2eff <- sqrt(ce2)

#additive genetic predispositions to intelligence. 100 is divided by the sum of the standardized coefficients to keep the average at 100.
a1 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
a2 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
a3 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
a4 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)

#nonadditive genetic + environmental predisposition to intelligence.
na1 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
na2 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
na3 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
na4 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)

#calculating the IQs of the grandparents.
iq1 <- h2eff*a1 + ce2eff*na1
iq2 <- h2eff*a2 + ce2eff*na2
iq3 <- h2eff*a3 + ce2eff*na3
iq4 <- h2eff*a4 + ce2eff*na4

df1 <- data.frame(IQ1 = iq1, A1 = a1, NA1 = na1)
df2 <- data.frame(IQ2 = iq2, A2 = a2, NA2 = na2)
df3 <- data.frame(IQ3 = iq3, A3 = a3, NA3 = na3)
df4 <- data.frame(IQ4 = iq4, A4 = a4, NA4 = na4)

The grandparents with an average IQ of 120 still need to be selected for:

#selecting grandparents with IQs between 119 and 121.2
g120ish1 <- df1 %>% subset(IQ1 < 121.2 & IQ1 > 119)
g120ish2 <- df2 %>% subset(IQ2 < 121.2 & IQ2 > 119)
g120ish3 <- df3 %>% subset(IQ3 < 121.2 & IQ3 > 119)
g120ish4 <- df4 %>% subset(IQ4 < 121.2 & IQ4 > 119)

#making the pairs of dataframes equal in size.
g120ish2 <- g120ish2[1:nrow(g120ish1), ]
g120ish3 <- g120ish3[1:nrow(g120ish4), ]
> mean(g120ish1$IQ1)
[1] 120.0623

Now, the distribution of intelligence that these four people have must be simulated by averaging the genotypes of the grandparents and simulating a set of environments:

#In practice, children inherit a somewhat randomized set of genes from their parents.
iqch1 <- (((g120ish2$A2 + g120ish1$A1)/2)*h2eff + ce2eff*rnorm(nrow(g120ish1), 100/(h2eff + ce2eff), 15))
iqch2 <- (((g120ish3$A3 + g120ish4$A4)/2)*h2eff + ce2eff*rnorm(nrow(g120ish3), 100/(h2eff + ce2eff), 15))

ach1 <- (g120ish2$A2 + g120ish1$A1)/2
ach2 <- (g120ish3$A3 + g120ish4$A4)/2

ch1 <- data.frame(IQ1 = iqch1, A1 = ach1)
ch2 <- data.frame(IQ2 = iqch2, A2 = ach2)
ch2 <- ch2[1:nrow(ch1), ]
> mean(ch1$IQ1)
[1] 112.1443
> mean(ch2$IQ2)
[1] 112.0144

As expected, the averages of the simulated parents with IQs of 120 is 112.

Now, to calculate the distribution of the grandchildren these two children have:

chch1 <- (((ch2$A2 + ch1$A1)/2)*sqrt(0.6) + sqrt(0.4)*rnorm(nrow(ch1), 100/(h2eff + ce2eff), 15))

dfchch1 <- data.frame(IQ1 = chch1, A1 = (ch2$A2 + ch1$A1)/2)
> mean(dfchch1$IQ1)
[1] 112.0886

112, again. A bit confusing, but this arises from the fact that regression to the mean has already happened, and a new set of environments have already been created. The genotypic predisposition of intelligence in all of the kin, on average, is roughly the same in each generation (in practice it will be slightly lower in the following generations due to mutations).

sebjenseb

Discussion about this post