When the Headline Average Gets It Wrong

Introduction

Simpson’s paradox is what happens when a headline statistic says one thing, while the meaningful subgroups say something different. It matters because many causal claims are made from averages: a school is “better”, a treatment “works”, a company “improved”, or a policy “reduced harm”. In messy outcomes, the average can be arithmetically correct and still analytically misleading.

Simpson s illustration 1 The classic case is graduate admissions at the University of California, Berkeley in 1973. Overall, men were admitted at a higher rate than women. But when researchers examined departments separately, the apparent pattern against women largely disappeared, and in the properly pooled departmental analysis the evidence pointed slightly the other way. The headline had mixed together departments with very different admission rates and very different applicant patterns. [Science]science.orgSex Bias in Graduate Admissions: Data from Berkeleyby PJ Bickel · 1975 · Cited by 975 — Sex Bias in Graduate Admissions: Data from…

How aggregation reverses a pattern

Simpson’s paradox is not a mistake in arithmetic. It is a mistake in interpretation. A combined average is a weighted average, and the weights matter. If one group is over-represented in easier cases and another is over-represented in harder cases, the overall rate can reflect case mix more than treatment, skill, fairness or quality.

This is why the paradox is so dangerous in headline statistics. The aggregate number feels like the broadest and fairest view, but it may actually be blending unlike situations. Edward Simpson’s 1951 paper on contingency tables made the underlying point sharply: relationships between two classifications may be impossible to summarise properly without reference to a third. [JSTOR]jstor.orgThe Interpretation of Interaction in Contingency TablesThe Interpretation of Interaction in Contingency Tables - jstorJune 26, 1951 — In general it is impossible to summarize the relation…Published: June 26, 1951

A simple structure usually sits behind the reversal:

A visible comparison: for example, men versus women, Treatment A versus Treatment B, or one school versus another.
An outcome: admission, recovery, sales, test scores or survival.
A hidden grouping variable: department, illness severity, customer type, year group, region or product mix.
Uneven distribution across groups: one side is concentrated in easier or harder subgroups.
Different baseline rates: the subgroups have very different chances of the outcome before the visible comparison is even considered.

The thinking error is to treat the headline average as if it answers a causal question. It usually answers only an accounting question: “What happened overall?” The causal question is narrower and harder: “What would have happened if comparable cases had been exposed to different choices?”

Berkeley admissions and case-mix lessons

The Berkeley admissions case is memorable because the first number looks so persuasive. In fall 1973, aggregate graduate admissions data showed a higher admission rate for male applicants than female applicants: roughly 44% for men and 35% for women in the full data reported by Bickel, Hammel and O’Connell. On its face, that seemed to support a claim of bias against women. [Science]science.orgSex Bias in Graduate Admissions: Data from Berkeleyby PJ Bickel · 1975 · Cited by 975 — Sex Bias in Graduate Admissions: Data from…

The problem was that graduate admissions were not one uniform queue. Applicants applied to departments, and departments differed greatly in selectivity. Women were more likely to apply to departments with lower admission rates, while men were more concentrated in departments with higher admission rates. Once the data were examined by department, the headline gap no longer supported the same causal story. [PubMed]pubmed.ncbi.nlm.nih.govSex bias in graduate admissions: data from berkeleyby PJ Bickel · 1975 · Cited by 975 — Women are shunted by their socialization an…

The six largest departments show the mechanism clearly. In the commonly reproduced Berkeley dataset, Department A admitted about 62% of male applicants and 82% of female applicants; Department B admitted about 63% of men and 68% of women; Department F admitted only about 6% of men and 7% of women. The crucial fact is not only the within-department rates, but the applicant mix: many more men applied to high-admission departments A and B, while many more women applied to lower-admission departments such as C, D, E and F. [Wikipedia]WikipediaSimpson's paradoxSimpson's paradox

That does not mean the Berkeley case proves there was no broader inequality. Bickel and colleagues explicitly pointed to a deeper social pattern: women were channelled towards fields that were more crowded and less favourable in several respects. The paradox changes the location of the question. It weakens the simple claim that admissions committees, taken as a single pooled process, directly preferred men; it raises a different question about why applicants were distributed across departments in the first place. [PubMed]pubmed.ncbi.nlm.nih.govSex bias in graduate admissions: data from berkeleyby PJ Bickel · 1975 · Cited by 975 — Women are shunted by their socialization an…

Simpson s illustration 2

Which subgroups deserve attention

The lesson is not “always split the data”. Any dataset can be sliced into unhelpful categories. The useful question is whether a subgroup changes the fairness of the comparison. In Berkeley admissions, department mattered because it shaped both the chance of admission and the gender composition of applicants. In medicine, severity of illness often matters for the same reason: sicker patients may be more likely to receive an aggressive treatment and less likely to recover, making the treatment look worse in the aggregate.

A subgroup deserves attention when it meets three tests:

It affects the outcome. Department selectivity affects admission rates; illness severity affects survival; prior attainment affects exam results.
It is unevenly distributed across the compared groups. If both sides have the same mix, aggregation is less dangerous.
It belongs in the causal story. The subgroup should not merely be statistically convenient; it should represent a real pathway, selection process or baseline difference.

This is why Simpson’s paradox connects directly to better causal thinking. The answer is not to distrust every average, but to ask what the average is averaging over. A headline statistic is a starting point. Before using it to argue that something caused something else, inspect the case mix.

The practical thinking habit

When a headline average seems to settle a causal argument, pause and ask: “Compared within what?” That question often reveals the hidden structure of the data.

For a school league table, compare pupils with similar starting points. For hospital outcomes, compare patients with similar risks. For workplace productivity, compare similar roles, teams and market conditions. For admissions, compare applicants within the decision units that actually assess them. The point is not to make the data say the opposite; it is to find the comparison that matches the real decision process.

Simpson’s paradox is powerful because it shows how a true number can support a false story. The aggregate may be accurate, but the causal claim may still be wrong. Better thinking begins by separating those two things.

Simpson s illustration 3

Amazon book picks

Marketplace Samples

Live-tested eBay searches with available results related to this page.

Example eBay listing

Statistics Say No Funny Data Humor Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: statistics poster

Browse similar on eBay.co.uk

Example eBay listing

Data Science Shirt - Statistics Dat Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: statistics poster

Browse similar on eBay.co.uk

Example eBay listing

Binomial Equation: Statistics Notes Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: statistics poster

Browse similar on eBay.co.uk

Example eBay listing

Statistics The Grammar Of Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: statistics poster

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: jstor.org
Title: The Interpretation of Interaction in Contingency Tables
Link: https://www.jstor.org/stable/2984065
Source snippet
The Interpretation of Interaction in Contingency Tables - jstorJune 26, 1951 — In general it is impossible to summarize the relation...

Published: June 26, 1951
Source: Wikipedia
Title: Simpson’s paradox
Link: https://en.wikipedia.org/wiki/Simpson%27s_paradox
Source: jstor.org
Title: Interactions in Multi-Factor Contingency Tables
Link: https://www.jstor.org/stable/2983765
Source: jstor.org
Link: https://www.jstor.org/stable/24591584
Source: science.org
Link: https://www.science.org/doi/10.1126/science.187.4175.398
Source snippet
Sex Bias in Graduate Admissions: Data from Berkeleyby PJ Bickel · 1975 · Cited by 975 — Sex Bias in Graduate Admissions: Data from...
Source: pubmed.ncbi.nlm.nih.gov
Link: https://pubmed.ncbi.nlm.nih.gov/17835295/
Source snippet
Sex bias in graduate admissions: data from berkeleyby PJ Bickel · 1975 · Cited by 975 — Women are shunted by their socialization an...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC8870532/
Source: pubmed.ncbi.nlm.nih.gov
Link: https://pubmed.ncbi.nlm.nih.gov/29484824/
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC2266743/
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC9960320/
Source: philosophy.hku.hk
Link: https://philosophy.hku.hk/think/stat/simpson.php
Source: discovery.cs.illinois.edu
Link: https://discovery.cs.illinois.edu/dataset/berkeley/

Additional References

Source: aeon.co
Title: how a statistical paradox helps to get to the root of bias in college admissions
Link: https://aeon.co/videos/how-a-statistical-paradox-helps-to-get-to-the-root-of-bias-in-college-admissions
Source snippet
How a statistical paradox helps to get to the root of bias in college admissions | Aeon VideosBerkeley in 1973, when roughly 44 per c...
Source: brookings.edu
Title: when average isnt [good enough]({{ ‘good-enough/’ | relative_url }}) simpsons paradox in education and earnings
Link: https://www.brookings.edu/articles/when-average-isnt-good-enough-simpsons-paradox-in-education-and-earnings/
Source snippet
When average isn't good enough: Simpson's paradox in...29 Jul 2015 — In the Berkeley case, the “paradox” occurred because women...
Source: youtu.be
Link: https://youtu.be/FDF_Q3_98GQ
Source snippet
"Overfitting vs Underfitting: [https://youtu.be/B9rhzg6_LLw](https://youtu.be/B9rhzg6_LLw) Why Models Overfit and Underfit - The Bias Variance Trade-off: [https://youtu.be..."](https://youtu.be...")...
Source: youtu.be
Link: https://youtu.be/N_RQj4OL1mg
Source snippet
Cross-Entropy - Explained: [https://youtu.be/Fv98vtitmiA](https://youtu.be/Fv98vtitmiA) Dropout - Explained: [https://youtu.be/FDF_Q3_98GQ](https://youtu.be/FDF_Q3_98GQ) Overfitting vs Underfitting: ht...
Source: youtu.be
Link: https://youtu.be/5mbX6ITznHk
Source snippet
"Least Squares vs Maximum Likelihood: [https://youtu.be/WCP98USBZ0w](https://youtu.be/WCP98USBZ0w) XGBoost Explained in Under 3 Minutes: [https://youtu.be/33fGfuleXw0..."](https://youtu.be/33fGfuleXw0...")...
Source: youtu.be
Link: https://youtu.be/G-fXV-o9QV8
Source snippet
"The Kernel Trick: [https://youtu.be/N_RQj4OL1mg](https://youtu.be/N_RQj4OL1mg) Cross-Entropy - Explained: [https://youtu.be/Fv98vtitmiA](https://youtu.be/Fv98vtitmiA) Dropout - Explained: [https://youtu..."](https://youtu...")...
Source: youtu.be
Link: https://youtu.be/B9rhzg6_LLw
Source snippet
"Why Models Overfit and Underfit - The Bias Variance Trade-off: [https://youtu.be/5mbX6ITznHk](https://youtu.be/5mbX6ITznHk) Least Squares vs Maximum Likelihood: [https://..."](https://...")...
Source: youtu.be
Link: https://youtu.be/Fv98vtitmiA
Source snippet
Dropout - Explained: [https://youtu.be/FDF_Q3_98GQ](https://youtu.be/FDF_Q3_98GQ) Overfitting vs Underfitting: [https://youtu.be/B9rhzg6_LLw](https://youtu.be/B9rhzg6_LLw) Why Models Overfit and Underf...
Source: youtu.be
Title: Why Models Overfit and Underfit
Link: https://youtu.be/WCP98USBZ0w
Source snippet
"XGBoost Explained in Under 3 Minutes: [https://youtu.be/33fGfuleXw0..."](https://youtu.be/33fGfuleXw0...")...
Source: youtu.be
Link: https://youtu.be/Kq6_kzlwSUQ
Source snippet
Hyperparameters Tuning: Grid Search vs Random Search: [https://youtu.be/G-fXV-o9QV8](https://youtu.be/G-fXV-o9QV8) The Kernel Trick: [https://youtu.be/N_RQj4OL1mg](https://youtu.be/N_RQj4OL1mg) Cross-E...

When the Headline Average Gets It Wrong

Introduction

How aggregation reverses a pattern

Berkeley admissions and case-mix lessons

Which subgroups deserve attention

The practical thinking habit

Further Reading

The Art of Statistics

The Book of Why

How to Lie with Statistics

Naked Statistics

Marketplace Samples

Statistics Say No Funny Data Humor Framed Wall Art Poster Canvas Print Picture

Data Science Shirt - Statistics Dat Framed Wall Art Poster Canvas Print Picture

Binomial Equation: Statistics Notes Framed Wall Art Poster Canvas Print Picture

Statistics The Grammar Of Framed Wall Art Poster Canvas Print Picture

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 5