Skip to main content
vansh.dev

The Man Who Proved Racism with a Bell Curve

Vansh Mundhra
·
5 min read
·
...
History of ScienceEssay No. 06

Karl Pearson invented half the statistical tools sitting in your ML pipeline. He also used those exact tools to argue that certain races were biologically inferior. This is not a coincidence. This is a warning.

History of Statistics Hero Banner

Open your machine learning textbook to almost any page. You will find, within a few paragraphs, one of these terms: correlation coefficient. Chi-square test. Standard deviation. Regression analysis. P-value. Normal distribution fitting.

Every single one of these was either invented or rigorously formalized by one man — Karl Pearson, a British mathematician and biostatistician working at University College London in the late 1800s and early 1900s.

He is, without exaggeration, one of the three or four most important figures in the history of statistics. The discipline of mathematical statistics as it exists today — the language that every data scientist, ML researcher, and quantitative analyst speaks daily — runs through his work in a way that cannot be disentangled.

He was also a committed eugenicist who spent significant portions of his career using the very statistical tools he invented to argue that certain races were evolutionarily beneath others.

This is the story of how the same mind built both things, why they were not as separate as we'd like to believe, and what that means for anyone who trusts numbers.


What he actually built

Before we get to the darkness, the record demands clarity on what Pearson actually contributed — because it is extraordinary.

In a series of eighteen papers published between 1893 and 1912, Pearson essentially created the technical vocabulary of modern statistics from scratch. The Pearson correlation coefficient — that r value between -1 and 1 — measures the strength of a linear relationship. The chi-square test, published in 1900, gave researchers a formal method for statistical hypothesis testing.

He founded the world's first university statistics department at UCL in 1911. He founded the journal Biometrika. He turned statistics from "the playing field of dilettanti" into a rigorous scientific discipline.

"He didn't just contribute to statistics. He invented the language that statistics speaks. And then he used that language to say things that should have never been said."

Karl Pearson's Statistical Foundations

The ideology underneath the mathematics

Pearson was a devoted follower of Francis Galton — the man who coined the term "eugenics." Pearson held the first Galton Chair of Eugenics at UCL, treating statistics and eugenics as complementary endeavours.

In a 1900 address, he stated: "History shows me one way, and one way only, in which a high state of civilization has been produced, namely the struggle of race with race, and the survival of the physically and mentally fitter race."

Analogy Imagine if the person who invented the compiler also argued that only certain nationalities should be allowed to write code, and published their immigration policy arguments in the same journal using the same methods. That is roughly the situation with Pearson.


The Jewish immigrant study — where the methods broke down

In 1925, Pearson published a study on Jewish immigrant children in East London. He used tables, correlations, and dense data to conclude that these children were biologically inferior in intelligence to native British children.

The study was flawed even by 1925 standards. Methodological objections were substantial: cultural bias in intelligence assessments, lack of standardized controls, and predetermined conclusions.

"The most dangerous form of bias is not the crude kind, which is easy to see. It is the kind dressed in equations — precise, reproducible, and wrong in its foundations."

Bias Baked into Statistics

The legacy nobody can cleanly separate

The problem is not simply that a bad person did good science. The deeper problem is that the ideology shaped the science in ways that are not always visible. The questions Pearson chose to ask were eugenic questions before they were statistical ones.

Analogy A hammer is a neutral tool. But if a hammer was designed specifically to drive a particular type of nail, it will drive that nail more easily than others. Statistical methods can work the same way — through the silent bias built into what they were designed to measure.


Why this is not a history lesson

Every machine learning dataset ever assembled has, embedded in it, human decisions about what to measure and what to label as ground truth.

When facial recognition performs worse on darker skins, or predictive policing assigns higher risk scores based on biased arrest data, the algorithm is working correctly. It is faithfully reproducing the bias baked into its inputs.

Pearson's sin was not that he used numbers. His sin was that he let a prior conclusion determine what he measured and how he interpreted it — and then dressed it in the authority of mathematics.


What we do with this

The lesson is not "don't trust statistics." The lesson is more specific and more demanding: understand whose questions your methods were built to answer, and remember that mathematical rigour applied to a biased question produces a rigorous, biased answer.

Numbers do not lie. But they do not volunteer the truth either. That part is still on us.

Modern Implications of Data Bias