Data
I have collected 3941 IQ scores and attempts on scholastic tests (e.g. PISA, PIRLS) across the globe into the HLO + IQ dataset. Performance on tests was graded relative to the countries that participated in them, so if an organization coincidentally sampled a wave of highly intelligent countries, scores on that test would be increased accordingly. There have been several other attempts to do this (e.g. harmonized learning outcomes, basic skills dataset), though I think my data is better, as the inclusion of the IQ scores allow for more precise linkage between sources. For example, the basic skills dataset links performance on several international tests based on only a few countries, and in two cases, just one.
Because national IQs cover so many countries, and in so many regions, doing these scale transformations is much easier and more precise. If there was a regional North African cognitive test assessment that was normalized at a mean of 500 and standard deviation of 100, performance on that test would have to be scaled, as North Africans perform worse than Europeans on tests. If Morocco were to be used as an anchor, the penalization could potentially range from 40 to 15 IQ points if only one sample was considered:
The scores on the IQ tests are adjusted for the Flynn Effect and normed to the UK mean and standard deviation, as Flynn Effects generally don’t pass measurement invariance, and are not on g. The performance on the scholastic tests is not adjusted for changes across time - I tried testing for a false Flynn Effect by using a regression model which predicted scores on the scholastic assessment tests based on the birth cohort and the country that took the test, and I found no time trend. Because of that, I think it’s fair to conclude that most of these changes reflect real changes in cognitive performance.
Trends
I ignored countries if they had less than 7 unique cohort years or less than 10 different observations, as the power to detect changes across time. Non-linear trends were plotted using LOESS, and p-values were computed for the non-linear trends by comparing a restricted cubic spline model to a linear model.
It’s worth mentioning that the p-value does not give the probability that the trend is true, it gives the probability that the trend (or one more extreme than it) would be observed under the null hypothesis, so in practice interpreting p-values can be tricky. Based on the p-value distribution of the linear trends, most of the values under .03 should be true hits.
These are the national averages, caculated by averaging the results of three estimation methods:
Weighing the averages of the TIMSS, PISA, PIRLS, IQ, EGRA, LLECE, PIAAC, PASEC, SACMEQ, and PLM tests by the square root of the number of samples each country had.
Taking the arithmetic mean with no regard for sample sizes.
Using the median.
Albania
The differences in means over time appear to be due to the TIMSS/PIRLS results being anomalous. Otherwise, Albania is consistently scoring at an IQ of ~85.
United Arab Emirates
Ignoring the one low IQ result from the 80s, stagnant.
Argentina
Flat trend, with the exception of the fluked PISA scores in 2000, which were sampled from Buenos Aires.
Australia
Linear p-value is .02, non-linear p-value is .008. The decline within assessments (particularly the PISA ones) is visually notable.
Austria
Flat.
Azerbaijan
Hard to tell because the results are inconsistent, but looks flat.
Belgium
p = .003 for the linear decline.
Bulgaria
Does well on the TIMSS/PIRLS, but not on the PISA. Fairly constant performance.
Bahrain
Linear p-value is .02. I think the trend is real because the effect size is large, despite the fact the p-value is not good.
Bermuda
Flat.
Brazil
Scores well in Becker’s IQ samples, but not on scholastic samples. Flat performance.
Botswana
Flat performance.
Canada
Notable decline past 1990. p = .0004 for the non-linear trend.
Switzerland
Both p-values are substantially above .05.
Chile
Both p-values are substantially above .05.
China
Flat performance (note: these figures were corrected for the fact that non-representative samples were collected in both the IQ samples and the PISA samples).
Democratic Republic of the Congo
Linear p-value is p = .02. I don’t trust this trend because it looks driven by the fact that it recently overperformed on the PASEC.
Colombia
Fairly convincing linear p-value (p = .0004).
Costa Rica
Flat trend.
Cyprus
Flat performance. Ignore the non-linear trend, the p-value is .27.
Czech Republic
Fairly convincing linear p-value (p = .0006).
Germany
Rise from the 80s to 90s, drop afterwards. Convincing non-linear p-value (p = .001).
Denmark
Nothing of note here, besides a possible recent decline in recent years. Removing the anomalous pre 1970 results does not change the p-values.
Dominican Republic
Flattish trend.
Ecuador
Flat.
Egypt
Possible decline, but doesn’t pass significance testing.
Spain
Clear positive trend when the IQ results are included, but they are not distributed equally across cohorts. Controlling for the fact Spain scores lower on IQ tests in comparison to scholastic ones results in a rather tenuous p-value (p = .031).
Estonia
Small, but statistically robust increase (p = .0016 for the linear increase).
Finland
Recent decline that is due to immigration. p = .0004 for the non-linear trend.
France
Awkward non-linear trend, but the linear decline looks fairly robust (p = .0007).
Great Britain
No change over time. The non-linear trend is a fluke.
Georgia
Flat.
Ghana
Probably a fluke, but hard to tell with so little data.
Greece
The p-value for the non-linear trend is iffy (p = .01), but visually compelling (e.g. compare the PISA reading results across time).
Hong Kong
Looks flat.
Croatia
Flattish.
Hungary
The linear p-value passes significance testing (p = .0097).
Indonesia
Flat.
India
Awkward chart, but has flat performance.
Ireland
Possible increase (linear p-value = .0043).
I tried using the year instead (which has less missing data) and got these results:
After removing the outliers (IQ > 106 or IQ < 90):
The time trend was fairly robust, even after controlling for whether the cognitive test administered was an IQ test.
Iran
Flat. Doubt removing the IQ results would change much.
Iceland
Performance may be declining in recent cohorts (p = .0082 for the non-linear p-value).
Israel
Linear increase is fairly convincing (p = .00071).
Italy
Weak evidence of a non-linear trend (p = .014), but a p-value this large after gathering so much evidence favours the null.
Jamaica
Evidence for the increase looks weak due to the disparities in sample averages.
Jordan
Linear decrease appears convincing (p = .0011).
Japan
Flat.
Kazakhstan
No observable increase, but lots of variance in sampling averages.
Kenya
No observable increase.
Cambodia
Nothing to note, but the massive disparities between samples.
South Korea
Performance appears fairly flat.
Kuwait
Evidence for a change is weak (p = .021), and there is no reason a priori (e.g. accelerated dysgenics, immigration, economic collapse) to think that Kuwait’s IQ should have decreased.
Laos (technically this should not have gotten through)
Extremely large inconsistencies between samples makes inferences difficult to make.
Lebanon
Nothing on note, both p-values are above .10.
Lithuania
Convincing linear increase (p = .00006). Nonlinear model is a somewhat better fit (p = .0012).
Luxembourg
Scored anomalously low on the first PISA wave, otherwise performance has been fairly flat.
Latvia
Similar trend to Lithuania. Both the linear and nonlinear p-values are sub .01.
Macau
Appears to be increasing, but the PIRLS outliers are dragging the trendline down.
Removing them reveals a strong upwards trend (p = .0017).
Morocco
Flat. Results are inconsistent within time cohorts.
Moldova
The trend is clearly due to a shift from TIMSS/PIRLS testing to PISA testing over time.
Mexico
Fairly flat performance.
North Macedonia
True ability is probably stagnant. Economist tier p-values.
Malta
Flat.
Montenegro
p = .002 for the linear increase. Looks legit.
Malaysia
This looks awful.
Nigeria
Flat.
Netherlands
p-values for both the linear (p = .00000008) and non-linear (p = .00013) are clearly robust.
Norway
Flat.
New Zealand
Temporal trend looks unconvincing, p-values are economist tier.
Oman
Flat.
Pakistan
Linear p-value is .021. I don’t trust it as there is too much variance within birth cohorts.
Peru
Fairly robust increase. Linear p-value is p = .00000016.
Philippines
Looks like the true trend is flat, but the results are inconsistent so the model picks up on weird variance.
Poland
Weak increase. Linear trend is p =.008.
Puerto Rico
Results are too inconsistent within years to judge, but I suspect a flat trend.
Portugal
Congratulations to Portugal for having the most consistent cognitive testing results.
Palestine
Awkward chart, but trend looks flat.
Qatar
p-value for the linear trend is p = .00000004.
Romania
No temporal trend. Based on the results of my dysgenic fertility study, Romanian IQ should be decreasing by about .65 points per decade, but there is no trend in scores at all. I suspect that the results for Romania from my study are wrong, or there is an environmental trend going in the opposite direction that is keeping the Romanian IQ high.
Russia
Statistically, the rise is robust (p = .005 for the linear trend, .0031 for the non-linear trend).
Saudi Arabia
I’m on the fence about this one. Theoretically, intelligence should have risen due to economic development, but the linear p-value is unconvincing (p = .016).
Scotland
Flat.
Sudan
Non-linear p-value is unconvincing, just .033.
Singapore
Flat.
Serbia
Linear p-value is convincing (p = .00000098).
Slovak Republic
Clear decline (linear p-value is .00006).
Slovenia
Flat.
Sweden
A small decline that barely passes statistical significance (p = .04). Apparently they systematically removed immigrants from the most recent PISA testing wave, so I’m inclined to believe that there is a true decline.
Thailand
…Let me fix that.
Decline is more robust after removing the IQ test score results.
Tunisia
Economist tier linear p-value (p = .07). Wouldn’t put stock into it.
Turkey
Linear p-value is p = .000002.
Taiwan
Flat.
Tanzania
Flat.
Uruguay
Flat.
United States
Flat.
South Africa
Normally I would throw out a weird non-linear result like this, but it appears in both the IQ samples and the TIMSS samples, which makes me think that it might be legit. The p-value is pretty good (p = .000086 for the nonlinear model in comparison to the linear model).
Ukraine
No time trend (at the request of Ubersoy).
Sub-Saharan Africa
In all tests:
Because the PASEC and SACMEQ tests are normalized within Africa, they are uninformative for observing differences across time. Because of that, they should be removed. Doing that produces this result:
The average increases by 2 points, and the decrease is no longer observed.
Certain datasets are not biased agaisnt, Sub-Saharan Africa, as the average IQ is roughly 70 regardless of whatever source is consulted.
h/t:
Becker for the IQ scores
Justin Malloy for his reviews of the national IQs of several countries (including Vietnam, Laos, the Cayman Islands)
Warne for his meta-analysis on the Irish IQ. Note: I only included samples that were tested within Ireland.
why is the morroco IQ lower than other North Africans, that doesn't make much sense and I would love your input
If you want more data in the PIAAC data explorer step 2 you can select age groups in 5-year or 10-year bands:
https://piaacdataexplorer.oecd.org/ide/idepiaac/