Stats sourced from this publication:
Laouenan, M., Bhargava, P., Eyméoud, JB. et al. A cross-verified database of notable people, 3500BC-2018AD. Sci Data 9, 290 (2022). https://doi.org/10.1038/s41597-022-01369-4
A new strand of literature aims at building the most comprehensive and accurate database of notable individuals. We collect a massive amount of data from various editions of Wikipedia and Wikidata. Using deduplication techniques over these partially overlapping sources, we cross-verify each retrieved information. For some variables, Wikipedia adds 15% more information when missing in Wikidata. We find very few errors in the part of the database that contains the most documented individuals but nontrivial error rates in the bottom of the notability distribution, due to sparse information and classification errors or ambiguity. Our strategy results in a cross-verified database of 2.29 million individuals (an elite of 1/43,000 of human being having ever lived), including a third who are not present in the English edition of Wikipedia. Data collection is driven by specific social science questions on gender, economic growth, urban and cultural development. We document an Anglo-Saxon bias present in the English edition of Wikipedia, and document when it matters and when not.
They have 5 variables which can be used to rank individuals
number_wiki_editions: number of different Wikipedia editions;
non_missing_score: total number of non-missing items retrieved from Wikipedia or Wikidata for birth date, gender and domain of influence;
total_count_words: total number of words in all biographies from Wikipedia;
wiki_readers_2015_2018: average per year number of page views in all Wikipedia editions (information retrieved in 2015–2018);
total_noccur_links: total number of external links (sources, references, etc.) from Wikidata;
I then created a composite of each of these 5 variables by extracting the first principal component of all 5 variables, which was then transformed to have a maximum of 100 and minimum of 0 within the whole dataset, which allows fields to be compared in terms of notability as well. The omega reliability of the ranking was 0.8 — not great, but not terrible either. Comparable to a high quality personality test or a SAT section.
As there are close to 5000 different occupations here, I limited this analysis to the top 100, from which I took the lists that were either interesting or highly accurate.
First, the overall ranking:
Academia
I posted the list just because I like Schopenhauer. Otherwise it’s not a great list — “academic” is not a very useful way to categorize people.
Activism
Acting
One of the most accurate ones here. But a few people (e.g. Prince) should be excluded.
Anthropology
Mengele? An anthropologist?
Art
Suffers from the same problem as the term “academic” — too generic.
Astronomer
Decent list.
Author
A good list that unfortunately suffers from a few bad entries (e.g. Ivana Trump)
Biology
Looks fine to me.
Boxing
Chemistry
Chess
Composing
Great man… Great man… Great man… Great man… Great man… Nicki Minaj.
Economy
Engineering
Entrepreneurship
Exploring
Film
Football
Golf
Guitar
History
Machiavelli doesn’t belong here.
Journalism
Ruling
Law
Wild list.
Modeling
Music
Avicii died in 2018, around the time this data was being collected.
Novel writing.
Painting
Philosophy
Had Aristotle been included, he would have been first.
Physics
Well…
Tennis
Interesting top 3.
Playwriting
Poetry
Politics
Psychology
Sculpting
Singing
Sociology
Sprinting
Swimming
Wrestling
Writing
Voltaire and Rousseau are writers as much as Nietzsche is. And… PewDiePie?
.
Looking at the overall ranking, the only person who I could find with possible Asian ancestry was Lenin. Before looking at the data, one could hypothesize they would be overrepresented in math heavy fields, and underrepresented in verbal tilted fields.
But from what I could see, Asians found their highest rankings in Activism (#3), Anthropology (#4), Art (#2), and Author (#4). I couldn't find any for Biology, Chemistry, Economy, or Engineering. Nakamura made the list for chess, but not Ding Liren. What are the chances?
Austen isn't on the novel writer list?