12 Comments

A bigger problem perhaps is how easy it is to practice on, or teach to others. Since subjects differ in exposure to such tests, this decreases the validity, especially in the modern internet age of mass exposure. Online samples probably have even more problems.

Expand full comment

You should write an article about 'practice effect' or 'training effect' as they call it. In the cognitive testing subreddit, it is completely devoid of empirical discussion regarding the matter. It would be really helpful to get an expert opinion some time.

Expand full comment

It is well known that g-loadings, especialy on the Wechsler, are crystallized biased. This means all fluid tests would have lower loadings because of the unbalanced battery. Jensen (1998) mentioned this quite often in his book, but it is a point often ignored (or forgotten) when people criticized his MCV. The gc bias and g-fluid loading is a subject I studied extensively 10 years ago and on which Chuck and I we were fighting over against Kan JK concerning the question of cultural loadings being (supposedly) correlated with g-loadings. Perhaps I should update this research, this time hopefully not using (or not just using) MCV.

Also, if the reliability of matrix reasoning is bad, we might as well throw all the other fluid tests in the bin, as they have similar reliabilities. In saying this though, I obviously disregard the 65-90 age sample. Toward the late 60s, verbal abilities decrease among the elderly, and this sample is of little interest when one analyzes group differences, predictive validities. If one subtest really has issues with reliability, and there is only one, it's the digit span subtest. Its retest reliability is low but also is the internal reliability, and the reason is because it's high if one uses alpha, which is a less recommended method compared to omega hierarchical, the latter which yields a value of 0.74. Gignac (2017) published a paper on Digit Span Subscale. And that is unfortunate. Jensen a long time ago showed how well this simple test reveals much about the relationship between g-loadings and group differences. This was not always replicated though (the lastest failure was from Beaujean). Perhaps reliability is one possible explanation.

Expand full comment
author

>Also, if the reliability of matrix reasoning is bad, we might as well throw all the other fluid tests in the bin, as they have similar reliabilities

I would not advocate for using any other of the fluid tests as a test of intelligence

The low g-loading is not specific to batteries that use the WAIS. In fact, two of the highest figures for the g-loading of the ravens/matrix reasoning come from FA that use the WAIS subtests. You can check the spreadsheet for evidence or emil's gathering post.

https://www.emilkirkegaard.com/p/which-test-has-the-highest-g-loading

Expand full comment
Aug 5, 2023Liked by Sebastian Jensen

It's best to use both crystallized and fluid tests if you want to measure g. Either of them alone will lead to g estimates that are substantially contaminated by other abilities. The more diverse and numerous the tests, the better the g estimate.

Expand full comment

Emil provided evidence from the major papers, those with the largest batteries. In reality though, there are many more studies than those. I remember Chuck and I we looked over so many studies reporting g loadings on so many different kinds of tests. The evidence wasn't crystal clear to me. Among the studies which come to mind right now are:

2006 Ashton & Lee. Minimally biased g-loadings of crystallized and non-crystallized abilities

2008 Arendasy. Investigating the ‘g’-saturation of various stratum-two factors using automatic item generation

Those are counter-examples. Even so, it's not just Wechsler that is biased in favor of crystallized tests. Many other tests also also biased in their content. I often complained about this in the past. Because many researchers never mentioned this problem. At the same time, I won't disagree about the idea that constructing a good verbal test is easier than a good fluid/reasoning test.

With regard to Raven APM/SPM loading the highest on the Wechsler, there is no controversy. If we assume for the sake of argument that Wechsler is the most crystallized bias battery, Jensen would say that is what he would expect. He explained in his book The g Factor and earlier in several papers, that the distinction between Gf and Gc is often misleading, if not wrong. Many fluid tests in reality require verbal/crystallized abilities. Jensen mentioned also the Raven being more "verbal-loaded" than verbal analogies. As a mental tool, Gf-Gc is still useful, but it stops there as soon as the discussion shifts toward pure entities.

Expand full comment

One idea is to make verbal tests with fake language, so that one cannot rely on prior information. There are some people who tried this approach, but it didn't get too popular. These kinds of exercises are used in linguistics to teach you how to decipher foreign languages. They give you some samples of a language, with some partial translations, and the reader has to figure out the rest. Maybe not a bad idea to try again.

Expand full comment

"will look into this"

Expand full comment
Aug 5, 2023·edited Aug 5, 2023

MR is only a subtest, and all subtests used by themselves will not make a good IQ test. MR does what it's supposed to do, it has good g-loading (I think I've also seen that fluid reasoning is the only factor that retains its g-loading as you go higher and higher) and good enough reliability. That's all you need to put it in the battery, since you can only adequately measure individual IQ with a large number of questions and several subtests. Using MR alone gives only a rough estimate, and not a very good one, but same will be true for any other subtest.

Expand full comment

Matrix tests seem to address the most fundamental substrate of general intelligence.

Even the most advanced AI Model GPT4 will not be able to solve even one RPM question but it will max out every wais4 subtest except matrix reasoning

Expand full comment

When “ObjectivelyCorrect.” completely ignores the article and is in turn transmogrified into “ObjectivelyIncorrect”.

Expand full comment