In the 1950s, Duke Power's Dan River Steam Station in North Carolina had a policy restricting black employees to its "Labor" department, where the highest-paying position paid less than the lowest-paying position in the four other departments. In 1955, the company added the requirement of a high school diploma for employment in any department other than Labor, and offered to pay two-thirds of the high-school training tuition for employees without a diploma.[3]
On July 2, 1965, the day the Civil Rights Act of 1964 took effect, Duke Power added two employment tests, which would allow employees without high-school diplomas to transfer to higher-paying departments. The Bennett Mechanical Comprehension Test was a test of mechanical aptitude, and the Wonderlic Cognitive Ability Test was an IQ test measuring general intelligence.
This eventually resulted in a lawsuit that was taken to the Supreme Court, who ruled that “Broad aptitude tests used in hiring practices that disparately impact ethnic minorities must be reasonably related to the job”.
While this didn’t actually make IQ testing for employment illegal, this definitely discouraged doing it, and to this day many people mistakenly believe that using IQ tests to select employees is not allowed. Other cases that involved race and standardized test taking have ruled that removing standardized tests from employment decisions because they show race differences is not allowed unless they have reason to believe they result in disparate impact.
Twenty city firefighters at the New Haven Fire Department,[1] nineteen white and one Hispanic, passed the test for promotion to a management position, yet the city declined to promote them because none of the black firefighters who took the same test scored high enough to be considered for promotion. New Haven officials invalidated the test results because they feared a lawsuit over the test's disproportionate exclusion of a certain racial group (blacks) from promotion under a disparate impact cause of action.[2][3] The twenty non-black firefighters claimed discrimination under Title VII of the Civil Rights Act of 1964.
The Supreme Court held 5–4 that New Haven's decision to ignore the test results violated Title VII because the city did not have a "strong basis in evidence" that it would have subjected itself to disparate impact liability if it had promoted the white and Hispanic firefighters instead of the black firefighters. Because the plaintiffs won under their Title VII claim, the Court did not consider the plaintiffs' argument that New Haven violated the constitutional right to equal protection.
This ruling also motivated research into the validity of tests of general cognitive ability - the most notable meta-analysis on this topic was released in 1984, and found a mean validity of 0.53. This meta and others have been highly scrutinized and debated in the scientific literature: one prominent rebuttal coming from Richardson and Norgate (R&N) who argue the correlation from the Hunter & Schmidt meta is inflated, and a defense coming from Kirkegaard and Zimmer (K&Z - though I should add that Zimmer is the first author here) who argue that the true correlation is between 0.5 - 0.6. From what I have read, there are the main points of contention:
Publication bias: neither of these studies discussed this, but priors are extremely high for this being an issue.
Correction for reliability of IQ: R&N allege that correcting for unreliability is not a good idea - as IQ tests are not perfectly reliable, correcting for this would inflate their utility in predicting job performance in the real world. I think this criticism is appropriate.
Correction for reliability of job performance: R&N argue that the intra-rater reliability (reliability within raters) should be used to adjust for unreliability instead of the inter-rater reliability (reliability between raters) because raters may value different traits within employees. I find this unconvincing (retarded, frankly).
The magnitude of the inter-rater reliability: Z&K note that the original Hunter & Schmidt 1984 meta assumed a reliability of 0.6 - later they published a meta which suggested it was a bit lower: 0.52, so the true correlation would have to be adjusted upwards.
Decreasing correlation with time: R&N argue that the correlation between job performance has decreased with time. While this could be possible, personally I don’t trust these authors. I’ve done meta-analyses before, and the time trend statistics are extremely easy to p-hack using moderators, which is part of the reason why I avoided posting them in the paper I linked.
Restriction of range: R&N argue that correcting for restriction of range after averaging the correlations is improper, and should be done before this correction. This is not correct, as applying this correction within studies introduces variance between effect sizes due to imprecise measurements of restriction of range.
Construct validity of IQ tests: R&N wrote that IQ tests have “poor construct validity” - Z&K argue that they do, and that they have high levels of predictive validity: 0.6 - 0.93 for general gaming ability, and even tests that were designed to measure independent abilities still have a substantial g-factor. This is a bad rebuttal from Z&K - construct validity refers to a test’s ability to measure what it intends to measure (intelligence), not it’s ability to predict other tests (predictive validity) or whether different parts of the test intercorrelate (consistency/reliability). Regardless, this is a pointless thing to argue about; what matters is the magnitude of the correlation between IQ and job performance, not the mechanism by which it arises.
Noncausality: both Z&K and R&N go on tangents arguing about the causal nature of the correlation between job performance and IQ. This should be investigated for academic purposes, but what really matters is if selecting for IQ works.
It’s better to use the best studies (large samples, wide range of jobs, corrections for reliability/RR within sample) from the literature to estimate the true correlation instead of the meta-analytic correlation. While correction for reliability/RR within the sample is usually a bad idea, in larger samples more precise estimates of these statistics can be made, so it would make more sense to make the adjustment within the sample.
One of these best studies was recently published:
Decades of research in industrial–organizational psychology have established that measures of general cognitive ability (g) consistently and positively predict job-specific performance to a statistically and practically significant degree across jobs. But is the validity of g stable across different levels of job experience? The present study addresses this question using historical large-scale data across 31 diverse military occupations from the Joint-Service Job Performance Measurement/Enlistment Standards Project (N = 10,088). Across all jobs, results of our meta-analysis find near-zero interactions between Armed Forces Qualification Test score (a composite of math and verbal scores) and time in service when predicting job-specific performance. This finding supports the validity of g for predicting job-specific performance even with increasing job experience and provides no evidence for diminishing validity of g. We discuss the theoretical and practical implications of these findings, along with directions for personnel selection research and practice.
While they don’t cite the correlation in the abstract, the correlation adjusted for restriction of range and the reliability of work samples/supervisory ratings is .39.
While this is a military only sample - the range of jobs is extremely diverse, so that should not result in problems with generalization.
Besides this study, there is similar study that uses a large (n = 4,039) military sample.
A predictor battery of cognitive ability, perceptual-psychomotor ability, temperament/personality, interest, and job outcome preference measures was administered to enlisted soldiers in nine Army jobs. These measures were summarized in terms of 24 composite scores. The relationships between the predictor composite scores and five components of job performance were analyzed. Scores from the cognitive and perceptual-psychomotor ability tests provided the best prediction of job-specific and general task proficiency, while the temperament/personality composites were the best predictors of giving extra effort, supporting peers, and exhibiting personal discipline. Composite scores derived from the interest inventory were correlated more highly with task proficiency than with demonstrating effort and peer support. In particular, vocational interests were among the best predictors of task proficiency in combat jobs. The results suggest that the Army can improve the prediction of job performance by adding non-cognitive predictors to its present battery of predictor tests.
They found a correlation of .63 between technical proficiency and cognitive ability between jobs.
I have no idea why this is so far away from the correlation of .39 that the prior study found. For what it’s worth, the correlation of .51 that Hunter and Schmidt found was probably accurate.
To measure the validity of IQ tests when predicting job performance in the real world, it would be best to uncorrect for the reliability of IQ. This would change the figures from both studies to .37 and .60 respectively, assuming a reliability of IQ of .90. Not much of a change, really.
There is the issue of the reliability of IQ but also of the reliability of job performance. Logically, the calculated correlation understates the real one between intelligence and job performance. Another point is that high intelligence seems to have very large externalities (the income externalities are much higher than the benefits captured the intelligent individual himself according to Garret Jones). It seems reasonable to assume that hiring smarter people, especially in management positions, has large benefits for an organization beyond the individual measurable contribution of the concerned employees. The corollary is that affirmative action, especially for important managerial positions, has a very significant cost.