Discussion about this post

User's avatar
Meridian's avatar

Calling the “missing heritability problem” a red herring overstates the degree of convergence between pedigree‐based estimates and modern molecular estimates, and it downplays what the gap actually tells us about genetic architecture, measurement, and inference. At its core, missing heritability is not the claim that twin or adoption studies are “wrong,” but that the portion of trait variance captured by the kinds of variants and models we typically use in genome‑wide analyses does not yet match what family designs imply should be there. That remains scientifically meaningful because it bears directly on biological mechanisms and on the limits of prediction and intervention. It also helps to start from a precise definition: heritability is a variance component defined for a particular population and environment, and SNP‑based heritability (from GREML/LDSC and relatives) is the variance tagged by measured variants under explicit assumptions; not “the percent of a trait caused by genes” in any global sense. The leading methods, their assumptions, and their biases have been dissected in depth for more than a decade; they are neither unexamined nor too new to evaluate.

On what is arguably the best‑measured human trait: height; the field has indeed narrowed the gap, but even this poster child illustrates why “red herring” is too strong. With five‑million–person GWAS, Yengo and colleagues showed that identified common variants now account for nearly all of the common‑SNP component of height, yet they explicitly note that this saturation is largely restricted to European ancestries and does not mean the total pedigree heritability has been fully mechanistically resolved. Whole‑genome sequencing (WGS) analyses that model low‑frequency and rare variants further recover a large fraction of the gap for height and BMI, but those results do not automatically generalize to other traits. These are successes of method and sample size; not evidence that missing heritability was never a real signal.

Once we step beyond height and BMI, the gaps remain informative. For major psychiatric disorders, twin/family estimates are high, while SNP‑based estimates remain much lower even in enormous cohorts. In schizophrenia, population‑based twin data place heritability around ~0.75–0.80, whereas SNP heritability from state‑of‑the‑art analyses is about ~0.24–0.25. Even allowing for ascertainment and modeling choices, this is not a rounding error; it points to genetic contributions that current common‑variant models and genotyping schemes still miss and to modeling differences between designs that matter for interpretation.

A central reason the gap cannot be dismissed is that “what’s missing” is increasingly being traced to specific sources. Rare and low‑frequency coding variants with effect sizes an order of magnitude larger than typical GWAS hits contribute to height and other traits; rare variants also account for a substantial share of cis‑heritability of gene expression. These observations are not pedantic quibbles about arcane methods—they tell us that much biologically relevant variance sits in alleles poorly tagged by SNP arrays and standard imputation, so array‑based GREML will undercount them by design. Emerging work on structural variants (copy‑number changes, complex rearrangements) and tandem repeat variation points the same way: they are common, can exert large functional effects, and are only now being measured at scale with long‑read sequencing and specialized genotyping, including in schizophrenia where repeat burden and high‑risk CNVs (for example, 22q11.2 deletions) materially affect liability. If missing heritability were a red herring, we would not repeatedly see variance reappear as soon as technologies and models expand to capture these classes of variation.

It is also inaccurate to suggest that molecular methods stand or fall on “tenuous” assumptions that have gone unexamined. The very concerns listed, LD mismatch between markers and causal variants, MAF‑dependent tagging, and assortative mating, are explicitly modeled in modern pipelines. GREML‑LDMS stratifies by LD and minor‑allele frequency to address incomplete tagging and shows how including rarer and lower‑LD variants raises h² estimates toward pedigree benchmarks when those variants are present; WGS‑based GREML generalizes this even further. Assortative mating, long‑range LD, and population structure are now recognized to bias marker‑based estimators and polygenic scores, and there is an active literature quantifying and correcting those biases, including through within‑family GWAS designs that reduce indirect genetic effects and stratification. That body of work shows attenuation of effect sizes within sibships, evidence that some of what looks “genetic” in population samples is a mix of direct and indirect processes, not that the molecular estimators are untrustworthy or that the questions they raise should be ignored.

Likewise, indirect genetic effects (“genetic nurture”) are not merely artifacts of assortative mating. Trio‑based and family‑based designs show that non‑transmitted parental alleles measurably affect offspring outcomes via the environments parents provide, with within‑family GWAS strengthening the inference about direct effects by shrinking population‑level signals. More recent work indicates that some indirect effects operate beyond the nuclear family and cannot be dismissed as simple assortment alone. The magnitudes vary by trait and cohort, but the phenomenon is robust enough that ignoring it inflates naive between‑family estimates and blurs the meaning of “genetic effect” in social and behavioral traits.

Nor is it convincing to claim “portability” is a nonissue because “people don’t talk about it.” They do, and with urgency. Polygenic scores trained in European‑ancestry samples lose much of their predictive accuracy in non‑European populations, which has been widely documented and remains a practical manifestation of missing heritability for most of the world’s people. This is precisely why the height map “saturation” result emphasizes ancestry limitations. Beyond ancestry, gene–environment interplay also limits portability: meta‑analyses find that the expression of genetic differences in cognitive outcomes is moderated by socioeconomic context, with stronger genetic effects in environments where basic resources and schooling are less constraining (a pattern especially visible in U.S. cohorts and weaker in Western Europe). All of this underscores that heritability is contingent and that gaps at the molecular level often track contingent features of sampling, technology, and environment rather than vanishing upon closer inspection.

Finally, appeals to the Equal Environments Assumption (EEA) as globally “settled” overlook both the logical limitation of misclassification tests and the empirical nuance across traits. Even sympathetic reviews acknowledge that while much evidence is consistent with the EEA, measures of environmental similarity are often coarse and some violations would push twin‑based heritability upward for social and behavioral outcomes. The right conclusion is not that twin studies are “mere statistical tomfoolery,” but that their estimates answer a different question: broad‑sense familial resemblance under assumptions about environmental equality; than SNP‑based estimates of additive, tagged variation. When these disagree, the discrepancy is not a nuisance to be waved away; it is a clue about mechanisms (rare and structural variation, indirect genetic effects, assortative mating, developmental and contextual modulation) that we should follow to build better models.

In short, the literature since 2017 has moved strongly in the direction of explanation, not dismissal. For some traits (especially height and BMI), WGS and LD/MAF‑aware modeling recover much of the gap; for others (notably psychiatric and many social phenotypes), large and biologically meaningful differences remain between pedigree and molecular estimates. Those differences have already catalyzed advances in measuring rare and structural variation, designing within‑family studies to isolate direct effects, and diversifying cohorts to improve portability. That is not what a “red herring” looks like, it is what a generative scientific puzzle looks like.

Expand full comment
AGI_singularity's avatar

guys can we do a debate between hereditarians and environmentalists

sasha gusev and eric turkenheimer vs ISIR

Expand full comment
3 more comments...

No posts