Why do Asians Outscore Whites on the SAT?

50% cognitive ability, 20% noncognitive ability, 30% test bias due to test becoming easier to prepare for

Feb 26, 2024

Racial/Ethnic Differences in the SAT in 2023 – Human Varieties — from here

While Asians have traditionally outscored Whites on the SAT, this has not always been true, and it appears that the difference has been increasing in magnitude with time. Some of this is bound to be due to Asian immigration becoming more selective over time, but it could also be due to changes in the content of the test as well.

In 2017, the SAT was reworked, so that theory would be best tested by observing if the Asian advantage changed in magnitude after controlling for the secular rise in Asian scores that occured ever since 2000. Dalliard has already studied this, and concluded that the redesign lead to an increased Asian advantage of about 0.21 SD (equivalent to 3 IQ points).

While it’s likely that the SAT has changed in content over time behind the scenes between 2000 and 2017, it seems implausible a priori that the college board has been consistently hacking the test in a way that leads to greater Asian advantages for all but two of the years between 2000 and 2017 — note that the Black and Hispanic differences have remained stable during this time period, which are the differences which would be most likely to be targeted by the test designers.

II.

Asian overperformance on tests is likely either a function of IQ or a greater likelihood to practice on tests. To determine which theory is correct, it would be best to determine what causes variance SAT scores at an individual level before making inferences about what causes these differences at a group level.

SAT scores and IQ scores as measured from the ASVAB correlate at about 0.81, which is a strong correlation, but is still subject to some error. The omega total reliability of the ASVAB administered in this wave of the NLSY is 0.95 and the reliability of the SAT is about 0.91, so the correlation controlled for unreliability is 0.87 — very high, but still imperfect.

Evidence of coaching data suggests that coaching on the SAT has marginal effects (d = 0.07 - 0.15 between coached and uncoached students) when controlling for PSAT scores. Other studies have replicated this effect and found that the effect size is of a similar magnitude (d = .11). Personally, I don’t find this evidence that convincing — preparing for the SAT can be done easily by using books or speed-reading exercises, so many of the uncoached students were probably practicing as well.

There is also the question of the dose-response effect - people will vary in their practice even if they are enrolled in the same program. Fortunately, there is now a study that uses data from the Khan Academy SAT preparation program, which examined the effect of test prep on SAT performance controlling for PSAT scores. They did find a dose response effect (particularly for maths), but this study suffers from the same problem as the other coaching studies - that there that the students who are not practicing online.

There are also regression design studies that try to examine the effect of cognitive abilities and noncognitive abilities on scholastic achievement. They found that, within Sweden, cognitive skills had much stronger effects on SAT scores than noncognitive ones by a factor of about 3.

The regression coefficient for motivation on SAT score is low (0.065), but that may be a function of the low validity of self-reported personality. Given that self-reports of personality correlate with peer-reports at about 0.47, the true effect is probably closer to 0.092 — low, but non-zero. Of course, this doesn’t account for the fact that conscientiousness/motivation will not always generalize to increased levels of practice, which could vary substantially for environmental/random reasons.

If you consider that the SAT correlates with cognitive ability at 0.83 when you correct for the unreliability of the ASVAB, that 9% of the variance in scores is error variance, that leaves 25% of the variance in the scores are unaccounted for.

Test anxiety/sleep/mood correlate with ACT scores independent of scores on original attempts, but the effects are surprisingly small, and are nowhere near large enough to account for 19% of the variance. Anecdotally, both of my high scores on scholastic tests came after poor nights of sleep.

Given the absence of another plausible factor, I must conclude that most of this remaining 25% is due differences in practice between students. If, say, 15% of the variance is accounted for by differences in practice, then the standardized effect of practice on SAT scores is 0.39 - lower than cognitive ability, but non-negligible.

Anedotally, my 2nd attempt on the scholastic tests I have taken was about 1 standard deviation above the first one in both cases. The first attempts were poorly prepared for, and the second ones involved much more practice, and it payed off. Other people seems to corroberate this:

I went through the quote tweets and collected reports from other accounts who corroberated the same thing, unfortunately they fit poorly into the article so I shoved them into the appendix. If you include me and FischerKing, a total of 9 people reported that their SAT scores increased substantially after practicing hard. I suppose you could argue that these are just outliers that are not representative of the general population, but you still have the problem of missing variance that cannot be explained by cognitive ability or measurement error.

Notably, the Asian advantage is larger at the right tail of the bell curve, according to Dalliard’s analysis:

If we take these estimates at face value, a random Asian would be about 7,000 times more likely to score above 1600 than a random black if such scores were possible. Compared to whites, the Asian advantage is 13-fold. This suggests that there were ≈ 9,500 Asian SAT-takers in 2020 who got a perfect score of 1600 in the SAT. The predicted numbers of white, Hispanic, and black perfect scorers in the same year were approximately 3,000, 90, and 2, respectively. Much of the Asian superiority in this comparison comes from their higher variance. If Asians had the same SD as whites but the means were unchanged, the Asian-white per capita ratios scoring above 1600 would be "only" about 4 to 1 rather than 13 to 1.[Note 21]

III.

There is also the question of whether this practice effect is leading to inflated scores being given to Asians. According to the Khan Academy study I cited prior, Asians are the ethnic group that practices the most for standardized tests, though the difference within the sample is not that large - Whites practiced an average of 1.7 hours, while Asians practiced 2.7 hours.

For the reasons I stated before, it’s impossible to measure the true race difference in practice for standardized tests because students will use more than one resource to practice with.

Alternatively, one could compare the scores that Asians recieve on the SAT to tests on IQ tests. If the White mean and SD is set to 100, Asians have the following IQs.

ABCD sample (n = 294, weighed by population size): 105.6 (2017)

WAIS-IV (n = 71): 103.1 (2008)

WISC-V (n = 89): 105.2 (2014)

MIDUS (n = 23): 101.8 (2005)

DAS (n = 48): 104.8 (1986)

WPPSI-R (n = 23): 100.3 (1986)

Project Talent (n = 949): 100.6 (1960)

These samples, besides the talent sample, converge to an IQ of roughly 103-105. A review of the literature conducted by Richard Lynn determined that their average IQ on IQ tests conducted before 1990 found that their average was the same was White Americans, but after 1990, the average changed to 105 points.

from Lynn’s *Race Differences in Intelligence*

This rise is reflected in changes in the Asian-White income gap:

The IQ metric score of Asians when compared to US Whites on the SAT was 110.3 in 2023 when a standard deviation of 200 within Whites is assumed, and in 2017 it was 109.8. Within post 2010 samples of Asians, their average IQ is 105.5 (95%CI: [104, 107], n = 383), which would be consistent with intelligence explaining about half of the Asian advantage, and noncognitive effects explaining the other half of the remaining variance.

There is some uncertainty regarding what the Asian IQ is right now, but the latest sample (ABCD) is fairly consistent with the average of 105.6 that was reported in previous samples, which leads me to believe that the estimate is close enough to the true mean. In addition, Dalliard’s own analysis of the SAT score data suggested that the 2017 redesign inflated Asian scores by 0.21 SD (equivalent of 3 IQ points), so that redesign can explain 60% of the noncognitive fraction of the Asian advantage.

Appendix

Anecdotes:

sebjenseb

Discussion about this post