Equality between sexes in intelligence is an impossibility. Adult men have brains that are 10-15% larger than women and brain size has a causal effect on intelligence; assuming a sex difference in brain size of 1.2 to 1.5 SD and a correlation between brain size and IQ of .25 to .35, there should be a male advantage in intelligence of 0.3 to 0.5 SD in IQ scores.
That doesn’t necessarily mean men are more intelligent than women, though this means that for the sex difference in intelligence to be null, the factors besides brain size that cause sex differences in intelligence would have to cause a difference in the opposite direction of exactly the same magnitude, which is extraordinarily unlikely, considering the distribution of effect sizes in psychology:

The case for the male advantage
Historically, women have been assumed to be less intelligent than men. With the advent of intelligence testing, early batteries found minimal sex differences in intelligence, which prompted many psychologists to conclude that men and women do not differ in intelligence. That view was held widely until Lynn promoted the developmental theory of sex differences in intelligence, which posits that sex differences in intelligence are a function of age, where the male advantage in IQ of 3-4 points emerges at the end of childhood. Hanania took a look at some of Lynn’s data on sex differences in intelligence, and thought that the theory held up.
Besides that, there is also a meta-analysis of the relationship between personality and intelligence which found a male advantage in general intelligence:
I also have recently conducted a meta-analysis¹ on sex differences in intelligence, which finds evidence for a small male advantage in adulthood and some evidence for the developmental theory of sex differences (that the sex difference in intelligence is close to null within children, but that men outpace women as they age):
There is no consensus within the field of psychology on whether there are sex differences in intelligence. To test whether there are, 2,092 effect sizes were gathered that measured differences in mental ability between men and women, representing 15,981,672 individuals. Men scored 2.58 IQ points (95% CI [1.91, 3.25], I^2 = 99.2%, k = 47) above women on general ability tests within adults. Whether this difference is due to general intelligence (g) is not clear, though it is likely. Two of the three methods used to test the developmental theory of sex differences suggested that the male advantage in ability increases with age.
I made the mistake of not tracking the studies I excluded from the initial spreadsheet — any study that did not have information that allowed a standardized difference to be calculated and did not report sample sizes by sex was not added. A rather egregious mistake; in my defense, it was the first meta-analysis I tried, and was 22 years old when I started compling the effect sizes.
This mistake is unlikely to affect the final results, as I was restrictive with regard to how many effect sizes were allowed into the final meta-analysis: they had to be roughly representative samples of the general population, between 40 and 60% female, test at least 3 major abilities and have 4 subtests, and be only composed of adults. Only 47 of ~2400 effect sizes were able to pass those qualifications.
There was no evidence of publication bias, though there was quite a bit of heterogeneity in effect sizes. Some samples had male advantages in intelligence as large as 6 to 8 points, while a few (particularly those that used the Woodcock Johnson) had slight female advantages in performance.
In terms of group factors of intelligence, men were favoured in terms of performance on most group factors, while women did better in a few like reading comprehension and processing speed.
In the former chart, regardless of ability, the male advantage in cognitive ability seems to grow with age. My analysis of the Project Talent dataset suggests that this is probably not the case — the developmental effect is not universal and in some subtests there is a negative age/sex interaction. There was also no statistically significant correlation between the age/sex interactions and the g-loadings within the subtests.

For full-scale ability tests, the results were largely consistent with the developmental hypothesis, which is that the male advantage emerges with age. To visualize the size of the difference by age, I subsetted the sample to those that tested general ability, matrix reasoning, and scholastic ability and used a restricted cubic spline to plot the difference by age.

The conclusion looks pretty straightforward — men are smarter than women.
Unfortunately, it’s not that simple.
Latent vs observed differences
The critics of the male advantage hypothesis have since pivoted to examining the sex difference at the latent level instead the observed level. Using latent models, it is possible to test whether the individual subtests are biased measurements of general intelligence by testing for measurement invariance. This involves subjecting the psychometric battery to mutliple tests, though the relevant one in this situation is strict invariance — that the intercepts of the abilities/subfactors/subtests (depending on the context) are equal across groups. If this is not satisified, then the structure of the model can be adjusted by adding group factors of ability, and allowing those factors to vary by group.
My criticism of this method is simple: that the unbiased estimate of a population mean is the sample mean, in this case, the population being all existing subtests of intelligence and the sample being the particular subtests inside a concrete battery.
Allow me to illustrate this with an example. Let us assume that there is a small male advantage in intelligence of 0.25 SD and large group factor differences in intelligence. The distribution of sex differences in performance on all possible tests of intelligence then conforms to this distribution:
A statistician then selects 9 subtests at random to test whether there is a general sex difference in intelligence. They find that when performance on all subtests is aggregated, the mean male advantage in performance is about 0.17 SD.
Were they to follow measurement invariance testing, the two subtests on the right and the subtest on the left would show up as outliers. Should they then add parameters to the model to control for the fact that these three subtests are outliers?
I contend that they should not. It’s just common sense. If you want to estimate an average, you calculate the mean or the median with every single unit of data included. You don’t just randomly exclude outliers based on vibes.
The new trend of overusing latent models is one of the worst things to come out of the new wave of quantitative social science; the Easter Bunny does not exist. My dislike of latent models in this situation might be kind of convenient, though I already began to hate them by the time I wrote my second paper that used them, I found that they occupied way too much of my time and returned the same results that regular statistical analysis gave.
As such, I reject any claim of sex difference in intelligence that is founded on latent modeling or measurement invariance testing, regardless of its conclusions.
To make matters worse, studies that are cited as proof of there being no sex difference in general intelligence often do not show this when their details are brought to light. Allow me to examine the seven latent modeling studies that the Reynolds 2022 review cited, 5 of which were alleged to provide evidence of a female advantage in latent intelligence.
In study one, the sample was on average 11 to 16 years old, which is too young. From what I see in the text, they do not report an aggregate full scale difference, but the observed group factor factor differences slightly favour women in the aggregate, from what I see:
In study two, they test very young children (ages 2 to 7) and both the observed and latent differences favour girls, which is unsurprising.
In study three, the sample age range is 6 to 16 and the observed full scale difference slightly favours women, as does the latent one.
In study four, the subjects were 13, the general difference favoured women, yet most of the group factors favoured men. 11 out of the 16 subtests had a female advantage.
In study 5, a slight female advantage in g is found. The ages assessed (5 to 59) were appropriate, but the test used (the Woodcock Johnson) has traditionally had a slight female advantage in observed scores as well. Hanania notes that the magnitude of the difference also depends on the calculation method, which is rather suspect:

Study 6 shows a slight female advantage at ages that are appropriate for testing. I have less complaints about this study, though I wonder if them having fixed means for crystalized and visualspatial tests artificially drags down the male advantage in g. They also did not specify what the observed difference in full scale ability was between the sexes, so it is not clear whether the battery itself simply has a female advantage or whether it was the latent method that caused there to be a female advantage.

Study 7 finds no g difference between boys and girls at the ages of 5 to 17. I have few complaints about this study, as they tested for the age x sex interaction and they had a large sample size at ages 16-17. In a similar veign to the previous study, it’s difficult to tell whether this null difference is an artefact of the test used or of overcorrecting for group factors.
Arguments for male advantages in intelligence that I dislike
Intelligence tests are indirectly manipulated to reduce sex bias, though this largely consists in removing items within subtests that show disproportionate advantages in favour of either sex. I think this is a bad practice, and hope that test-writers optimize for better outcomes like item g-loadings or time efficiency; that aside, the critique is too convenient and too lazy. It’s also historically ignorant: back in the early 1900s, when the idea that men were more intelligent than women was uncontroverisial, men and women still had roughly similar scores on tests of intelligence. On the first Weschler test that was ever created, men outscored women by about 3 IQ points, which is consistent with performance on modern batteries.
Likewise, people argue that test-makers are biasing IQ tests by refusing to add spatial or mechanical reasoning tests to the battery that men tend to score high on — the problem here is that you could make the same argument with regard to tests of social cognition, which tend to have a female advantage, but are nonetheless excluded.
The argument that women are stupid because they are less rational, believe in weird superstitions, or prefer visual over written content (e.g. tiktok over twitter) is also unconvincing, as you could argue that men’s tendency to take bad risks or live short lives could also be indicative of low intelligence.
Arguments for null sex differences in intelligence that I dislike
Traditionally, studies that have employed the method of correlated vectors (correlating subtest g-loadings with their gender differences) to test for a difference in g have found that the correlation between the two vectors is zero. I think this has been misinterpreted to suggest that there is no difference in g between sexes, when the much more reasonable conclusion to make is that the group factor differences in intelligence (e.g. differences in spatial ability or processing speed) are much larger than any g difference that exists between the sexes.
As in the case of arguments in favor of male advantages in intelligence, I also dislike the argument that intelligence tests favour men by design, when in reality these sex differences in performance are a massive embarassment for the test makers, and I am sure they would be satisfied if they were able to erase them in some acceptable or simple way.
Some have argued that the male advantage is an artefact of lower side of the bell curve not being adequately sampled. I also dislike this theory as it is difficult to falsify — perfectly representative samples are practically impossible to gather, especially for psychometric testing. There is also a male advantage in both NLSY samples, which went out of their way to sample underprivileged groups and even interviewed respondents who were in jail.
Beyond the use of latent models and the method of correlated vectors, I have also disliked the sex-difference-denialist’s abuse of the null hypothesis — as I elaborated on before, it’s simply impossible that men and women have equal levels of intelligence, and it is orders of magnitude more likely that there is a female advantage in intelligence than it is that sexes do not differ in intelligence.
Conclusion
On average, men score above 2 to 4 IQ points higher on full scale intelligence tests within adult samples. It varies somewhat between test, e.g. the WAIS has a male advantage of 3 points, and the Woodcock Johnson has a small female advantage (ironically). Although it’s not clear whether this is a difference in general ability, there are no good arguments for it not being one, which to me suggests that there probably is one. If there was a good argument as to why there is no sex difference in general intelligence within adults, people would have already adopted it; the fact that the denialists have stuck to citing flimsy latent models or studies of children/adolescents is rather telling.
I’m still somewhat reticent to claim that there is a male advantage in intelligence for sure, as observed differences in IQ do not necessarily imply latent ones. Given that the difference between the sexes is small, it’s still possible for it to be a product of sampling bias in terms of subtests or due to advantages in group factors of cognitive ability (e.g. spatial reasoning) but not general ability.
In line with this source³, I think the current probability distribution in terms of the male advantage in g can be summarized by the following table:
Footnotes
1 — if you read the forum, me and the reviewer are, well, finding it difficult to agree on anything. Our conflict seems to be beyond any personal animosity or object-level disagreement — I think it stems more from different thinking styles — he is a detail oriented thinker, while I am a holistic thinker.
2— I quote: “Although no g difference seems to be the consensus, five out of seven latent variable studies summarized here showed small female advantages in g along with a profile of sex differences in more specific factors in children and adolescents (H¨arnqvist, 1997; Keith et al., 2008; Palejwala & Fine, 2015; Reynolds et al., 2008; Ros´en, 1995). Lynn (1999) proposed a developmental model of sex differences in intelligence where females have very small g advantages from ages 9–12 that change to small male advantages at about age 16 and into adulthood. Empirical findings related to this theory are mixed (cf. ArribasAguila, Abad, & Colom, 2019; Keith et al., 2008; Reynolds et al., 2008), but age is potentially a moderator.”
3 — this was once revealed to me in a dream.
Practically speaking, the 3 IQ points don't matter that matter. In descending order of importance, most observed sex differences are due to:
(1) Sex differences in personality
(2) Sex differences in variance of IQ
(3) Sex differences in psychometric tilt
(4) Sex differences in average IQ
[I could see an argument for switching the order of (2) and (3)]
The world we live in would not look all the different if men and women had the same average IQ as opposed to the current situation where fully-developed men are slightly smarter than fully-developed women.
This is an old one. Glad to see Lynn has been vindicated once again. However, I’ve always had problems in general with smallish IQ differences. In short, what does a 3 point difference in IQ mean in practical terms? Additionally, is a 3 point difference, say between a 100 and a 103 IQ the same as a difference between a 120 a 123 IQ?