Why do Asians Outscore Whites on the SAT?
50% cognitive ability, 20% noncognitive ability, 30% test bias
While Asians have traditionally outscored Whites on the SAT, this has not always been true, and it appears that the difference has been increasing in magnitude with time.
In 2017, the SAT changed a great deal in content, so that theory would be best tested by observing if the Asian advantage changed in magnitude after controlling for the secular rise in Asian scores that occured ever since 2000. Dalliard has already studied this, and concluded that the redesign lead to an increased Asian advantage of about 0.21 SD (equivalent to 3 IQ points).
While it’s likely that the SAT has changed in content over time behind the scenes between 2000 and 2017, it seems implausible a priori that the college board has been consistently hacking the test in a way that leads to greater Asian advantages for all but two of the years between 2000 and 2017 - note that the Black and Hispanic differences have remained stable during this time period, which are the differences which would be most likely to be targeted by the test designers.
The effect of the redesign of 0.21 still cannot account for the 0.65 SD advantage that Asians have over Whites in the 2017 version of the SAT, though Intuitively, the two most likely causes are that Asians are more intelligent than Whites, or that they practice harder on the test.
II.
To determine which theory is correct, it would be best to determine what causes variance SAT scores at an individual level before making inferences about what causes these differences at a group level.
SAT scores and IQ scores as measured from the ASVAB correlate at about 0.81, which is a strong correlation, but is still subject to some error. The omega total reliability of the ASVAB administered in this wave of the NLSY is 0.95, and the reliability of the SAT is about 0.91, so the correlation controlled for unreliability is 0.87 - very high, but still imperfect.
Evidence of coaching data suggests that coaching on the SAT has marginal effects (d = 0.07 - 0.15 between coached and uncoached students) when controlling for PSAT scores. Other studies have replicated this effect and found that the effect size is of a similar magnitude (d = .11). Personally, I don’t find this evidence that convincing - preparing for the SAT can be done easily by using books or speed-reading exercises, so many of the uncoached students were probably practicing as well.
There is also the question of the dose-response effect - people will vary in their practice even if they are enrolled in the same program. Fortunately, there is now a study that uses data from the Khan Academy SAT preparation program, which examined the effect of test prep on SAT performance controlling for PSAT scores. They did find a dose response effect (particularly for maths), but this study suffers from the same problem as the other coaching studies - that there that the students who are not practicing online.
There are also regression design studies that try to examine the effect of cognitive abilities and noncognitive abilities on scholastic achievement. They found that, within Sweden, cognitive skills had much stronger effects on SAT scores than noncognitive ones by a factor of about 3.
The regression coefficient for motivation on SAT score is somewhat low (0.065), but that may be a function of the low validity of self-reported personality. Given that self-reports of personality correlate with peer-reports at about 0.47, the true effect is probably closer to 0.092 - still low, but non-zero. Of course, this doesn’t account for the fact that conscientiousness/motivation will not always generalize to increased levels of practice, which could vary substantially for environmental/random reasons.
If you consider that the SAT correlates with cognitive ability at 0.83 when you correct for the unreliability of the ASVAB, and that 9% of the variance in scores is error variance, 25% of the variance in the scores are unaccounted for. It is correct that the ASVAB disproportionately measures cognitive ability, though that would bias the relationship upwards, not downwards. If the ASVAB is subsetted to tests that only measure crystalized ability, increases the reliability adjusted correlation to 0.85 - which decreases the unaccounted variance to 19%.
Test anxiety/sleep/mood correlate with ACT scores independent of scores on original attempts, but the effects are surprisingly small, and are not large enough to account for 19% of the variance.
I’m going to make a somewhat bold claim: I think that most of this 19% is due differences in practice between students. If , say, 15% of the variance is accounted for by differences in practice, then the standardized effect of practice on SAT scores is 0.39 - lower than cognitive ability, but non-negligible. Yes, I am aware that the statistical evidence suggests that the statistical correlation between practice and SAT scores is low, but these studies are biased by the fact that many students in the control group are probably practicing as well.
Anedotally, my 2nd attempt on the scholastic tests I have taken was about 1 standard deviation above the first one in both cases. The first attempts were not prepared for that well, but I tryharded in practice the 2nd time because of emotional reasons, and it payed off. Other people seems to corroberate this:
I went through the quote tweets and collected reports from other accounts who corroberated the same thing, unfortunately they fit poorly into the article so I shoved them into the appendix. If you include me and FischerKing, a total of 9 people reported that their SAT scores increased substantially after practicing hard. I suppose you could argue that these are just outliers that are not representative of the general population, but you still have the problem of missing variance that cannot be explained by cognitive ability or measurement error.
About the top end in particular: the Asian advantage is larger at high ends of ability.
If we take these estimates at face value, a random Asian would be about 7,000 times more likely to score above 1600 than a random black if such scores were possible. Compared to whites, the Asian advantage is 13-fold. This suggests that there were ≈ 9,500 Asian SAT-takers in 2020 who got a perfect score of 1600 in the SAT. The predicted numbers of white, Hispanic, and black perfect scorers in the same year were approximately 3,000, 90, and 2, respectively. Much of the Asian superiority in this comparison comes from their higher variance. If Asians had the same SD as whites but the means were unchanged, the Asian-white per capita ratios scoring above 1600 would be "only" about 4 to 1 rather than 13 to 1.[Note 21]
III.
There is also the question of whether this practice effect is leading to inflated scores being given to Asians. According to the Khan Academy study I cited prior, Asians are the ethnic group that practices the most for standardized tests - though the difference within the sample is not that large - Whites practiced an average of 1.7 hours, while Asians practiced 2.7 hours.
For the reasons I stated before, it’s impossible to measure the true race difference in practice for standardized tests because students will use more than one resource to practice with. A better method would be to compare the scores that Asians recieve on the SAT to tests on IQ tests. If the White mean and SD is set to 100, Asians have the following IQs.
ABCD sample (n = 294, weighed by population size): 105.6 (2017)
WAIS-IV (n = 71): 103.1 (2008)
WISC-V (n = 89): 105.2 (2014)
MIDUS (n = 23): 101.8 (2005)
DAS (n = 48): 104.8 (1986)
WPPSI-R (n = 23): 100.3 (1986)
Project Talent (n = 949): 100.6 (1960)
These samples, besides the talent sample, converge to an IQ of roughly 103-105. A review of the literature conducted by Richard Lynn determined that their average IQ on IQ tests conducted before 1990 found that their average was the same was White Americans, but after 1990, the average converges to 105.
This rise can even be seen with the Asian-White income gap increasing over time relative to Whites.
The IQ metric score of Asians when compared to US Whites on the SAT was 110.3 in 2023 when a standard deviation of 200 within Whites is assumed, and in 2017 it was 109.8. Within post 2010 samples of Asians, their average IQ is 105.5 (95%CI: [104, 107], n = 383), which would be consistent with intelligence explaining about half of the Asian advantage, and noncognitive effects explaining the other half of the remaining variance. There is, of course, some uncertainty regarding what the Asian IQ is right now, but the latest sample (ABCD) is fairly consistent with the average of 105.6 that was reported in previous samples, which leads me to believe that the estimate is close enough to the true mean.
In addition, Dalliard’s own analysis of the SAT score data suggested that the 2017 redesign inflated Asian scores by 0.21 SD (equivalent of 3 IQ points), so that redesign itself can explain 60% of the noncognitive fraction of the Asian advantage.
Appendix
Objection to higher Asian IQ: it’s because they cheat on tests.
…On low stakes IQ tests and the SAT?
Objection to higher Asian IQ: they score higher due to sterile motivation.
Objection to higher Asian IQ: it’s because of environmental/cutural effects.
No.
Objection to higher Asian IQ: if Asians are so intelligent, then why did they lag behind Europeans in scientific/national development?
This critique is obviously wrong, but I’m too jaded to respond to it.
###############
Correlation between a crystalized ability composite (general science knowledge, arithmetic reasoning, mathematical knowledge, paragraph comprehension, and word knowledge) is slightly higher than the correlation between general ability and SAT scores. Subtest scores are corrected for age at testing, but not sex.
Composites have the same omega total, so this is not biased by differential validity:
It is true that the reliability of IQ does fall when test-retest intervals are taken into account, but the NLSY sample took the ASVAB when they were 12-18 years old, so they should have taken the SAT at around the same time, so this should not be a large source of bias.
Anecdotes:
Further reading:
Dalliard: great poster, and the source of most of the data in this article.
Milky Eggs: a good quantitative introduction to standardized testing, though I think he is underestimating the effect of practice on standardized test scores.
Meng Hu: similarity between IQ scores and SAT scores across groups
My three kids did some prepping, with books and free online resources. My impression is that it does have an impact but with decreasing marginal returns. I am wondering how the stats account for the multiplicity of test taking including among high scorers. My older son was happy enough with 1590. An East-Asian classmate of his had gotten 1600 (after lots of practice) but did not have a perfect score at the optional Essay test so decided to take the SAT against (where she got a high score but not perfect). Other point: serious preppers will start before they take the PSAT, so the high correlation PSAT - SAT cannot demonstrate that coaching doesn’t work.
Does the data include tests taken abroad in East Asia? Because that draws a very self selected group of test takers and there have been many cheating scandals.