Online IQ tests are considered a meme, as the tests that appear on the first page of google are pretty bad. Fortunately, the internet has progressed and in the modern day there are some legitimate options that can be taken. Options for non-native English speakers are still limited unfortunately; the best that can be taken are matrix tests.
My recommendation is that if you are a fluent English speaker, take the CAIT and the AGCT with a testing interval of one day. Averaging the scores of the two FSIQ tests should given you a rough estimate of general intelligence that’s deflated/inflated by about 1-3 points depending on whether the score is above/below 100. This is because averages of indicators of a latent variable are going to be lower than the latent ability itself on average.
If you are a non-native, the best you can use is the matrix reasoning tests. They aren’t great tests of intelligence, but they are better than nothing.
Here is my assessment of the multiple options there are online:
Key:
Computerized: c
Suitable for non-natives: n
Reputation for being inflated: i
Reputation for being deflated: d
Tier 1: actual IQ tests
AGCT [c, d, i]: precursor to the ASVAB, 150 items. I’ve seen people complain about it being either deflated or inflated, but the test was normed on real people, making this complaint kind of silly. For whatever reason, there is no flynn effect on this test, which is pretty remarkable.
PMA [i]: composite IQ test of ~170 questions normed on portugese people. Given Portugal’s national IQ is 94-95, the score will be inflated by a few points/
ASVAB: 4th edition, practice test. Yes, I know it’s a practice test, but it can’t be that different from the real thing.
Tier 2: good proxies for IQ (g-loading ~0.8-0.9, high ceiling)
Old SAT [c]: there’s a rumor that the old SAT is much better than the new one, which is probably not true, as the correlation between IQ and SAT scores was the same in the NLSY79 (taken in 1975-1982) and the NLSY79 (taken in 1996-2002). Nevertheless, the ceiling is much higher, so it will be more reliable at the higher ranges. Scores can be converted to IQ using statistics from here.
New SAT [c]: there are practice tests on Khan Academy that can be taken which are close enough to the real thing. Scores can be converted to IQ using this article. If you want the score uncorrected for regression to the mean, then scroll down to the third image and use the column ‘IQ (no RTM)’.
ACT: kind of like the SAT, but tests reading comprehension more. Scores can be converted to IQ using this article. If you want the score uncorrected for regression to the mean, then scroll down to the third image and use the column ‘IQ (no RTM)’.
A comment about the SAT and ACT: the norms are made on a selected sample of teenagers who are studying a large amount of time to get high scores. While the selected sample can be adjusted for by looking at the nationally representative samples, this is an example of a test that is best studied for.
CAIT [c]: amateur-made clone of the WAIS-IV, a full scale test with 9 subtests. Took it, and it seems legit enough, but the subtests don’t correlate that highly with each other.
Tier 3: mediocre tests (g-loading ~0.65-0.8, high ceiling)
D-48/D-70 [c, n]: numerical reasoning/fluid reasoning test normed on highschool seniors. Discussion and offline version are available here.
ICAR-60 [c]: long version of the ICAR-16 (16 items), which already has a surprisingly good correlation with IQ (0.81). The paper on the test is not promising - self-reported SAT scores only correlate with ICAR-60 scores at .54, and they didn’t report the correlation between ICAR-60 and educational attainment, even they clearly could’ve done it.
BRGHT [c]: 60 questions that must be answered in 40 minutes. Not sure how legit it is, but there are enough questions there to make me believe that the g-loading is at least .7.
Public Domain Intelligence Test [c]: 60 questions of verbal/nonverbal items that must be answered in 30 minutes.
TRI-52 [c, n]: very unique matrix reasoning test created by Xavier Jouve.
JCTI [c, n]: another unique matrix reasoning test by Xavier Jouve. Took it and got almost exactly the same score on it as I did on the TRI-52, which I took like a year ago.
Ravens 2 [n, i]: matrix reasoning test. Reputation for being inflated.
Mensa norway [c, n, d]: matrix reasoning test. Normed on an internet population, and has a reputation for being deflated.
Mensa dk [c, n, d]: basically the same as mensa norway but a different name.
RAPM [n]: matrix reasoning test.
Multifactor general knowledge test [c]: I analyzed this test myself, and I found that the full scale score was highly reliable (rxx = .93) and had a high ceiling (z = 3.15). Unfortunately the website only gives you subtest scores, so it’s difficult to generate a precise measurement of general knowledge. If you really want to test your general knowledge, take the test and keep track of your answers (there are 320 potential answers, by the way!). Then score the test by adding 1 for each correct answer and distractor you avoid. The, score yourself based on the norms I provided in the paper, maybe add 5 points for good measure due to self-selection in online samples. Remember to ignore the internet abbreviations question - it isn’t taken into account in the norms because of its low g-loading.
Tier 4: bad tests (g-loading 0.5-0.65, maybe has a low ceiling)
ICAR-16 [c]: 16 item test that correlates at .8 with IQ, but I think that high correlation is probably a statistical fluke (n = 97). The reliability at the upper ends of ability is extremely low, making it unsuitable for those who want to test high ability).
Static General Intelligence Test [c]: 60 questions that must be answered in 12 minutes. Internal reliability of .86, which is disappointing.
openpsychometrics FSIQ [c, d]: it sucks. It correlates at .54 with WAIS FSIQ and it underestimates scores by 5 points.
KBIT matrix reasoning [n]: part of a “legitimate” test that was leaked. According to the OP, the ceiling isn’t very high, so there isn’t much point in using it.
iqtest.com [c]: 40 questions that measure verbal/mathematical/logical reasoning. Scored a believable number. No public data or stats to support its validity, but it can’t be that terrible if it’s a 40 item test that tests highly g-loaded abilities. Took it, and the score seemed legitimate.
Tier 5: bottom of the barrel
arealme [c, n]: not much information about this test. Took it, and it was like 20-30 matrix reasoning items and I scored 20 points below what I tend to get on these types of tests. Website is business/marketing oriented and posts no data/stats, which makes me skeptical of it. For what it’s worth, the r/cognitivetesting mods seem to feel the same way.
openpsychometrics VIQ [c, d]: sucks. Underestimates scores by 11 (!) points and correlates with WASI-II VCI at only .48.
123test [c]: looks like another marketing/business scam website.
Endnotes:
I classified tests into tiers based on the number of questions they had or actual evidence on how g-loaded the test is. All FSIQ tests were placed in T1. Scholastic tests (g-loading .8-.85) were placed into T2. Matrix reasoning tests (g-loading .7) were placed into T3. Bad tests that were still somewhat fine were placed into T4. Then, really bad tests were placed into T5.
If there wasn’t good data on the g-loading of a test, then I guessed it based on the number of items that were in the test.
I did collect 27 self-reports of scores on several of these tests a year ago or so. Based on the data, the openpsychometrics tests were deflated (which is now known based on real data) and that the ravens 2 is inflated (which most people on the r/cognitivetesting sub already suspected). All of the matrix reasoning tests were intercorrelated highly (r = ~.85) which surprised me, as the test-retest of the ravens is lower in other samples (r = .45-.85).
I think there is a typo:
"Old SAT: there’s a rumor that the old SAT is much better than the new one, which is probably not true, as the correlation between IQ and SAT scores was the same in the NLSY79 (taken in 1975-1982) and the NLSY79 (taken in 1996-2002). "
I think the second "NLSY79" should be "NLSY97".
Also, the received wisdom (as espoused by respected HBD commentators like Steve Sailer) is that it's the 2017 redesign that *really* screwed up the g-loading of the SAT. Unfortunately, I don't believe there has been any studies on g-loading/psychometric properties of the post-2017 SAT test, so we don't know for sure if the new test is worse at measuring intelligence. But the circumstantial evidence does point in that direction. It's since 2017 that we've seen the massive rise in Asian test scores, indicating that the current test is more amenable to test-prep (which implies a lower g-loading).
Where would you put the GRE on this list? Do you know any conversion methods for GRE score to IQ?