How good will embryo selection be in the future?

GWAS of low h^2 traits cannot be relied on

Jan 06, 2025

IQ

Embryo selection for IQ is already being done, and it’s not very effective — selecting the best out of 10 embryos for predisposition to intelligence results in a gain of 6 IQ points for somebody of Northwestern European descent. It could lead to increases in the number of people in the cognitive elite within countries that keep it legal but is unlikely to have a significant population-wide effect.

The most critical polygenic score for predicting intelligence is the one for educational attainment; so much that polygenic scores for educational attainment predict IQ scores slightly better than realized education. At the moment, polygenic scores for education correlate at about 0.3 with scores on the add health vocabulary test and using multiple polygenic scores increases this correlation to 0.37.

Predicting the effect this technology will have on the average IQ of the world is extremely difficult — it’s not clear how many people will use the technology, how many eggs can be harvested from participants, the degree to which the parents will select for intelligence, and how accurately the polygenic scores are.

Of all these contingencies, the accuracy of polygenic scores is the easiest to forecast. A GWAS that predicts educational attainment is released about every 3 years, which is accompanied by an increase in predictive validity of about 3%.

The amount of variance the polygenic scores for educational attainment can explain are entirely a function of the logged sample size — a one percent change in the sample size leads to an increase of 0.033 in explained variance.

> da <- data.frame(r2=c(2.64, 5.81, 6.91, 10.09, 13.28), ss=c(126559, 293723, 405072, 1131881, 3037499))
> lr <- lm(data=da, log(ss) ~ r2)
> predict.lm(lr, newdata=data.frame(r2=40), interval='prediction')
       fit     lwr     upr
1 22.96205 22.1573 23.7668
> 2.71^22.1573
[1] 3921303605
> 2.71^23.7668
[1] 19511674501
> log(9500000000)
[1] 22.97456
#lwr is the lower bound of the 95% prediction interval, and upr is the upper bound of it. Prediction interval was used over the confidence interval because I am trying to predict where the datapoint will fall, not the line.

To get to the maximum 40% of explained variance, between 3.9 and 19.5 billion genotypes are necessary.

If the trend of tripling sample sizes roughly every 3 years continues, polygenic scores that explain 40% of the variance should exist by 2052.

These two predictions are contradictory, to say the least.

This prediction is contingent on more biobanks being created and added to these datasets. — the rate at which these appear and the number of people included in them is unknown, so I tried to estimate it by gathering every single biobank with genetic data that were used in EA GWASes or I knew of:

ABCD: 5,690 samples according to Lee et al (EA3)
FENLAND: 8,535 samples according to Lee et al
ELSA: 6,065 samples according to Lee et al
GEISINGER: 14,562 samples according to Lee et al
VIKING health study: 1,841 samples according to Lee et al
Manchester Study: 1,713 samples according to Okbay et al (EA2)
Susceptibility–Reykjavik Study: 3,212 samples according to Okbay et al
Austrian Stroke Study: 777 samples according to Okbay et al
Avon Study: 2,877 samples according to Okbay et al
Berlin Aging Study: 1,619 samples according to Okbay et al
CoLaus: 3,269 samples according to Okbay et al
Copenhagen Study of Asthma: 318 samples according to Okbay et al
Croatia Korcula: 842 samples according to Okbay et al
Dortmund Health: 953 samples according to Okbay et al
Wellcome Diabetes: 2,578 samples according to Okbay et al
Erasmus Family Study: 2,422 samples according to Okbay et al
Family Heart Study: 3,483 samples according to Okbay et al
FINRISK: 1,685 samples according to Okbay et al
Finnish Twin Cohort: 2,418 samples according to Okbay et al
Genetics of Overweight Adults: 1,459 samples according to Okbay et al
GRAPHIC: 727 samples according to Okbay et al
H2000: 1,616 samples according to Okbay et al
Helsinki Birth Cohort: 1,617 samples according to Okbay et al
Hunter Community Study: 1,946 samples according to Okbay et al
HNRS: 3,526 samples according to Okbay et al
Hypergenes: 815 samples according to Okbay et al
INGI: 1,890 samples according to Okbay et al
KORA: 5,376 samples according to Okbay et al
LBC: 1,518 samples according to Okbay et al
MCTFR: 3,819 samples according to Okbay et al
MGS: 2,313 samples according to Okbay et al
MoBa: 622 samples according to Okbay et al
NBS: 1,808 samples according to Okbay et al
NESDA: 1,820 samples according to Okbay et al
NFBC: 5,297 samples according to Okbay et al
NTR: 5,246 samples according to Okbay et al
OGP: 914 samples according to Okbay et al
ORCADES: 1,828 samples according to Okbay et al
PREVEND: 3,578 samples according to Okbay et al
QIMR: 8,006 samples according to Okbay et al
Rotterdam Study: 10,815 samples according to Okbay et al
Rush: 1,695 samples according to Okbay et al
SardiNIA: 5,616 samples according to Okbay et al
SHIP: 4,457 samples according to Okbay et al
STR: 14,385 samples according to Okbay et al
THISEAS: 829 samples according to Okbay et al
TwinsUK: 4012 samples according to Okbay et al
British Birth Cohort: 2,804 samples according to Okbay et al
YFS: 2,029 samples according to Okbay et al
100k Genomes: 100,000 genomes
Add Health: 15,000 genotypes.
CARTaGENE biobank: 44,000 individuals
Generation Scotland: 36,000 individuals
GERA: 100,000 individuals
German National Cohort:200,000 individuals
IARC Biobank: 562,000 individuals
LifeLines (Netherlands): 15,638 genomes
Taiwan biobank: 197,000 individuals
UK Biobank: 500,000 samples.
FinnGenn: goal of 500,000 samples.
AllofUs: goal of 1,000,000 samples.
Estonia Biobank: currently has 200,000 samples.
Biobank Japan: currently has 200,000 samples.
China Kadoorie Biobank: 100,000 genotypes
Million Veterans Program: 1,000,000 veterans
Nurse’s Health Study: 33,000 genotypes.
HUNT: 88,000 genotypes.
Proyecto oriGen: goal of 100,000 genotypes.
Health and Retirement Study: 20,000 genotypes.
23andme: n = 2,272,216 according to the EA4 paper.
deCode: launched in ~1996, >160,000 Icelanders.

A total of 5,620,070 samples collected in ~20 years of genomic sequencing. Assuming a constant rate of genomic data collection, this implies 281,000 genetic samples are placed into semi-public biobanks every year.

According to this rate, it would take 30,000 years to get to the maximum level of explained variance. This could even be an underestimate, since not all of the individuals in these studies are genotyped, and not all of them will have data on educational attainment. I also generously assumed goals to be collected data.

Other options?

Even if aggregating a bunch of GWAS summary statistics will not bring utopia, there are several other avenues that can be taken that are much more effective.

Assuming intelligence is as heritable as height, only about 1 million individuals with reliable cognitive testing data would be necessary to get a good predictor of cognitive ability; Hsu was able to get a r = .6 correlation between true and predicted height using only 500,000 individuals. The latest height GWAS predicts height at r = .64 in populations of European ancestry.

I was told in a private conversation that somebody tried to access the Million Veterans Program, which has cognitive testing scores that could be used to make an extremely large (n = ~1,000,000) GWAS for IQ, which would be sufficient to get an r = .6 predictor of intelligence. Unfortunately, their request was denied for an unknown reason. If somebody were to be able to access that data and run the statistics, that would be a massive boon to the embryo selection industry. The UK Biobank has a sample size that is this competitive, but unfortunately it is limited by the fact that the IQ tests that were used were unreliable, so little variance in cognitive ability could be predicted.

There is also the option of using multiple trait selection, where multiple polygenic scores are leveraged (e.g. ADHD, educational attainment, schizophrenia) to get a more accurate prediction of intelligence. The people behind the recently maligned embryo selection startup were able to get r = .3 to .37 using multiple selection. Which is good, but more would need to be done.

Besides improving the predictors, the selection process could also be improved. Gwern has an idea on how this realistically could be achieved:

Massive Multiple Embryo Selection: A set of eggs is extracted from a woman, or alternately, some somatic cells like skin cells. If immature eggs in an ovary biopsy, they are matured in vitro to eggs; if somatic cells, they are regressed to stem cells, possibly replicated hundreds of times, and then turned into egg-generating-cells and finally eggs, yielding hundreds or thousands of eggs (all still identical to her own eggs). Either way, the resulting large number of eggs are then fertilized (up to a few hundred will likely be economically optimal), and then selection & implantation proceeds are in simple multiple embryo selection.

Assuming 200 embryos are selected, that would lead to a gain of 10.8 IQ points on average with current technology, and a gain of 20.6 points using a predictor that explains 50% of variance in intelligence. According to gwern, this process would take 5 years and 5-100k dollars, so perhaps some tech autists or mega rich people could take advantage of it, but I doubt that normies would.

High IQ midwits

Genetic selection technology is great. IQ matters. Institutions that select intensely for cognitive ability accordingly have extremely smart populations, with average IQs of 120-130 or so (e.g. in elite universities or gifted programs); the catch is that these people are typically competent and get good jobs, but are rarely higher types or geniuses that are capable of outlier-tier achievement.

So what else could you select for?

The best candidate is age/sex/location adjusted income. Education could inadvertently select for a kind of oversocialized, slow-maturing, or conformist type, while income would select for agency and competence in addition to intelligence. The problem here is that selecting for income would eventually run into the same problem that selecting for education has — it is not that heritable as a trait, so it will be difficult to calculate good polygenic scores for it.

If reliable personality predictors existed, ideally selection would be made for average levels of autism/ADHD, ~80% percentile conscientiousness, ~40th percentile neuroticism, ~40th percentile agreeableness, ~80th percentile extraversion, and as high openness as possible. Too much conscientiousness would lead to a high conformity/rigid type, too little neuroticism leads to blunting, too little agreeableness leads to being an asshole, too much extraversion leads to being hyperactive and oversocial; even too much openness could lead to overly deviant or pathological behaviour. It’s not much of a surprise that a lot of the discourse about eugenics surrounds IQ — more is better. It’s that simple. Perhaps around the 150-170 level you start encountering maladaptive behaviour due to the fact that humans did not evolve to be that intelligent, but most highly intelligent people tend to be fairly stable.

Conclusion

Embryo selection for traits beyond height and pigmentation looks fairly bleak unless scientists adopt different methods of conducting GWAS, such as MTAG (multi-trait analysis of GWAS) or somebody manages to find a large dataset of individuals who have taken a reliable IQ test.

I assume a highly reliable predictor of eye/hair/skin colour could be made easily with existing literature, though I doubt that one could be made for faces, especially with the data privacy laws that currently exist.

Good news:

I’m not that familiar with the way IVF works in practice, but it appears that now clinics have started maturing eggs outside of the womb, which makes the process cheaper and more easy on the woman.

I also got orange ticked on substack because I podcasted with Walt Bismarck, the guy who made all of those funny alt-right disney parodies back in the day. Now substack shows that I have 4.7k subscribers across all platforms, which is a little deceptive. I have 2.3k between the anime publication and this one.

Blue Vir

Jan 6

It should be noted that the reason why only 10 embryos are the selection sample is because the women who do IVF tend to be older and have few eggs. Among women around age 20, egg extractions that retrieve 100 embryos are possible.

Expand full comment

1 reply

Compsci

I find the discussion interesting, but without knowing just what a difference a gain of 6 points makes, have a hard time coming to conclusion of value. I assume from discussion the 6 points hypothesized is a shift of the Bell curve to the right of 6 points. So my question is, for a previous hypothesized potential intellect of say dead average (100), is a move to 106 the same benefit as say a move from 115 lot 121? Or 130 to 136? The assumption of an interval scale for IQ—rather than rank order—comes into play. Or perhaps I’m overthinking this…

6 replies

10 more comments...

sebjenseb

Discussion about this post