The correct answer is 112.
This is because regression to the mean doesn’t happen twice.
Consider the conventional formula for estimating the average IQs of children that two parents with an IQ of 120:
EIQ = ((120+120)/2 - 100) x 0.6 + 100
EIQ = 112
Intuitively, the average IQ of a child who breeds with another child who also has an IQ of 112 and two parents with an IQ of 120 should be this:
EIQ = ((112+112)/2 - 100) x 0.6 + 100
EIQ = 107.2
It doesn’t work that way, because a grandchild with grandparents that all have IQs of 120 are uniquely elevated in terms of genetic predisposition to intelligence.
Allow me to simulate a distribution of IQs of 4 people, where their additive genes account for 60% of the variance in their intelligence, and the nonadditive genes/environment account for the rest of it.
#setting the seed
set.seed(123)
#additive h^2 of 60%, environment/nonadditive effect of 40%
h2 <- 0.6
ce2 <- 0.4
#calculating the standardized coefficients of each component, as variance components are squared by definition.
h2eff <- sqrt(h2)
ce2eff <- sqrt(ce2)
#additive genetic predispositions to intelligence. 100 is divided by the sum of the standardized coefficients to keep the average at 100.
a1 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
a2 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
a3 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
a4 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
#nonadditive genetic + environmental predisposition to intelligence.
na1 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
na2 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
na3 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
na4 <- rnorm(1000000, 100/(h2eff + ce2eff), 15)
#calculating the IQs of the grandparents.
iq1 <- h2eff*a1 + ce2eff*na1
iq2 <- h2eff*a2 + ce2eff*na2
iq3 <- h2eff*a3 + ce2eff*na3
iq4 <- h2eff*a4 + ce2eff*na4
df1 <- data.frame(IQ1 = iq1, A1 = a1, NA1 = na1)
df2 <- data.frame(IQ2 = iq2, A2 = a2, NA2 = na2)
df3 <- data.frame(IQ3 = iq3, A3 = a3, NA3 = na3)
df4 <- data.frame(IQ4 = iq4, A4 = a4, NA4 = na4)
> mean(df1$IQ1)
[1] 99.98538
> mean(df2$IQ2)
[1] 99.97243
The grandparents with an average IQ of 120 still need to be selected for:
#selecting grandparents with IQs between 119 and 121.2
g120ish1 <- df1 %>% subset(IQ1 < 121.2 & IQ1 > 119)
g120ish2 <- df2 %>% subset(IQ2 < 121.2 & IQ2 > 119)
g120ish3 <- df3 %>% subset(IQ3 < 121.2 & IQ3 > 119)
g120ish4 <- df4 %>% subset(IQ4 < 121.2 & IQ4 > 119)
#making the pairs of dataframes equal in size.
g120ish2 <- g120ish2[1:nrow(g120ish1), ]
g120ish3 <- g120ish3[1:nrow(g120ish4), ]
> mean(g120ish1$IQ1)
[1] 120.0623
Now, the distribution of intelligence that these four people have must be simulated by averaging the genotypes of the grandparents and simulating a set of environments:
#In practice, children inherit a somewhat randomized set of genes from their parents.
iqch1 <- (((g120ish2$A2 + g120ish1$A1)/2)*h2eff + ce2eff*rnorm(nrow(g120ish1), 100/(h2eff + ce2eff), 15))
iqch2 <- (((g120ish3$A3 + g120ish4$A4)/2)*h2eff + ce2eff*rnorm(nrow(g120ish3), 100/(h2eff + ce2eff), 15))
ach1 <- (g120ish2$A2 + g120ish1$A1)/2
ach2 <- (g120ish3$A3 + g120ish4$A4)/2
ch1 <- data.frame(IQ1 = iqch1, A1 = ach1)
ch2 <- data.frame(IQ2 = iqch2, A2 = ach2)
ch2 <- ch2[1:nrow(ch1), ]
> mean(ch1$IQ1)
[1] 112.1443
> mean(ch2$IQ2)
[1] 112.0144
As expected, the averages of the simulated parents with IQs of 120 is 112.
Now, to calculate the distribution of the grandchildren these two children have:
chch1 <- (((ch2$A2 + ch1$A1)/2)*sqrt(0.6) + sqrt(0.4)*rnorm(nrow(ch1), 100/(h2eff + ce2eff), 15))
dfchch1 <- data.frame(IQ1 = chch1, A1 = (ch2$A2 + ch1$A1)/2)
> mean(dfchch1$IQ1)
[1] 112.0886
112, again. A bit confusing, but this arises from the fact that regression to the mean has already happened, and a new set of environments have already been created. The genotypic predisposition of intelligence in all of the kin, on average, is roughly the same in each generation (in practice it will be slightly lower in the following generations due to mutations). What is different is that the grandparents on average have a nonadditive genetic/environmental predisposition to intelligence that is above the population mean.
#average additive genetic predisposition to intelligence of the grandparents
> mean(g120ish1$A1)
[1] 86.67644
> mean(g120ish2$A2)
[1] 86.63988
> mean(g120ish3$A3)
[1] 86.64211
> mean(g120ish4$A4)
#average additive genetic predisposition to intelligence of the children of the grandparents
[1] 86.52578
> mean(ch1$A1)
[1] 86.65816
> mean(ch2$A2)
[1] 86.58184
#average additive genetic predisposition to intelligence of the grandchild
> mean(dfchch1$A1)
[1] 86.62
#the genetic mean is just 86 because I scaled the variance components so that they add up to 100 in the general population.
#Source 1
#Source 2
If you have 4 grandparents with IQ 120 (generation 1), then it would seem likely that if the children (generation 2) have IQ 112, that they are deviations from the family mean and we would expect upward regression toward the family mean in the grandchildren (generation 3), resulting in IQ between 112 and 120.
If the 4 grandparents with IQ 120 are coming from a family average of 100 (like you assume in a1 through a4), then I think this analysis is right. But given the high additive heritability of IQ and assortative mating for IQ, isn't it more likely that the grandparents actually come from a high genotypic IQ family line?
Of course, the typical case would never be very clean cut, but I'm wondering if this is a little misleading due to your assumptions. Am I mistaken here?
So are you saying regression to the mean can't happen over many generations since "regression to the mean doesn’t happen twice"? Gregory Clark's work shows that regression to the mean can occur incrementally over many generations (even with assortative mating). How does one reconcile this apparent inconsistency?
Yes, Clark is referring to regression in social status not in intelligence per se (but surely intelligence is a major contributor to social status).