Within Cause Check

When the Headline Average Gets It Wrong

Overall numbers can point one way while meaningful subgroups tell a different causal story.

On this page

  • How aggregation reverses a pattern
  • Berkeley admissions and case mix lessons
  • Which subgroups deserve attention
Preview for When the Headline Average Gets It Wrong

Introduction

Simpson’s paradox is what happens when a headline statistic says one thing, while the meaningful subgroups say something different. It matters because many causal claims are made from averages: a school is “better”, a treatment “works”, a company “improved”, or a policy “reduced harm”. In messy outcomes, the average can be arithmetically correct and still analytically misleading.

Simpson s illustration 1 The classic case is graduate admissions at the University of California, Berkeley in 1973. Overall, men were admitted at a higher rate than women. But when researchers examined departments separately, the apparent pattern against women largely disappeared, and in the properly pooled departmental analysis the evidence pointed slightly the other way. The headline had mixed together departments with very different admission rates and very different applicant patterns. [Science]science.orgSex Bias in Graduate Admissions: Data from Berkeleyby PJ Bickel · 1975 · Cited by 975 — Sex Bias in Graduate Admissions: Data from…

How aggregation reverses a pattern

Simpson’s paradox is not a mistake in arithmetic. It is a mistake in interpretation. A combined average is a weighted average, and the weights matter. If one group is over-represented in easier cases and another is over-represented in harder cases, the overall rate can reflect case mix more than treatment, skill, fairness or quality.

This is why the paradox is so dangerous in headline statistics. The aggregate number feels like the broadest and fairest view, but it may actually be blending unlike situations. Edward Simpson’s 1951 paper on contingency tables made the underlying point sharply: relationships between two classifications may be impossible to summarise properly without reference to a third. [JSTOR]jstor.orgThe Interpretation of Interaction in Contingency TablesThe Interpretation of Interaction in Contingency Tables - jstorJune 26, 1951 — In general it is impossible to summarize the relation…Published: June 26, 1951

A simple structure usually sits behind the reversal:

  • A visible comparison: for example, men versus women, Treatment A versus Treatment B, or one school versus another.
  • An outcome: admission, recovery, sales, test scores or survival.
  • A hidden grouping variable: department, illness severity, customer type, year group, region or product mix.
  • Uneven distribution across groups: one side is concentrated in easier or harder subgroups.
  • Different baseline rates: the subgroups have very different chances of the outcome before the visible comparison is even considered.

The thinking error is to treat the headline average as if it answers a causal question. It usually answers only an accounting question: “What happened overall?” The causal question is narrower and harder: “What would have happened if comparable cases had been exposed to different choices?”

Berkeley admissions and case-mix lessons

The Berkeley admissions case is memorable because the first number looks so persuasive. In fall 1973, aggregate graduate admissions data showed a higher admission rate for male applicants than female applicants: roughly 44% for men and 35% for women in the full data reported by Bickel, Hammel and O’Connell. On its face, that seemed to support a claim of bias against women. [Science]science.orgSex Bias in Graduate Admissions: Data from Berkeleyby PJ Bickel · 1975 · Cited by 975 — Sex Bias in Graduate Admissions: Data from…

The problem was that graduate admissions were not one uniform queue. Applicants applied to departments, and departments differed greatly in selectivity. Women were more likely to apply to departments with lower admission rates, while men were more concentrated in departments with higher admission rates. Once the data were examined by department, the headline gap no longer supported the same causal story. [PubMed]pubmed.ncbi.nlm.nih.govSex bias in graduate admissions: data from berkeleyby PJ Bickel · 1975 · Cited by 975 — Women are shunted by their socialization an…

The six largest departments show the mechanism clearly. In the commonly reproduced Berkeley dataset, Department A admitted about 62% of male applicants and 82% of female applicants; Department B admitted about 63% of men and 68% of women; Department F admitted only about 6% of men and 7% of women. The crucial fact is not only the within-department rates, but the applicant mix: many more men applied to high-admission departments A and B, while many more women applied to lower-admission departments such as C, D, E and F. [Wikipedia]WikipediaSimpson's paradoxSimpson's paradox

That does not mean the Berkeley case proves there was no broader inequality. Bickel and colleagues explicitly pointed to a deeper social pattern: women were channelled towards fields that were more crowded and less favourable in several respects. The paradox changes the location of the question. It weakens the simple claim that admissions committees, taken as a single pooled process, directly preferred men; it raises a different question about why applicants were distributed across departments in the first place. [PubMed]pubmed.ncbi.nlm.nih.govSex bias in graduate admissions: data from berkeleyby PJ Bickel · 1975 · Cited by 975 — Women are shunted by their socialization an…

Simpson s illustration 2

Which subgroups deserve attention

The lesson is not “always split the data”. Any dataset can be sliced into unhelpful categories. The useful question is whether a subgroup changes the fairness of the comparison. In Berkeley admissions, department mattered because it shaped both the chance of admission and the gender composition of applicants. In medicine, severity of illness often matters for the same reason: sicker patients may be more likely to receive an aggressive treatment and less likely to recover, making the treatment look worse in the aggregate.

A subgroup deserves attention when it meets three tests:

  • It affects the outcome. Department selectivity affects admission rates; illness severity affects survival; prior attainment affects exam results.
  • It is unevenly distributed across the compared groups. If both sides have the same mix, aggregation is less dangerous.
  • It belongs in the causal story. The subgroup should not merely be statistically convenient; it should represent a real pathway, selection process or baseline difference.

This is why Simpson’s paradox connects directly to better causal thinking. The answer is not to distrust every average, but to ask what the average is averaging over. A headline statistic is a starting point. Before using it to argue that something caused something else, inspect the case mix.

The practical thinking habit

When a headline average seems to settle a causal argument, pause and ask: “Compared within what?” That question often reveals the hidden structure of the data.

For a school league table, compare pupils with similar starting points. For hospital outcomes, compare patients with similar risks. For workplace productivity, compare similar roles, teams and market conditions. For admissions, compare applicants within the decision units that actually assess them. The point is not to make the data say the opposite; it is to find the comparison that matches the real decision process.

Simpson’s paradox is powerful because it shows how a true number can support a false story. The aggregate may be accurate, but the causal claim may still be wrong. Better thinking begins by separating those two things.

Simpson s illustration 3

Amazon book picks

Further Reading

Books and field guides related to When the Headline Average Gets It Wrong. Use these as the next step if you want deeper reading beyond the article.

BookCover for The Book of Why

The Book of Why

By Judea Pearl, Dana Mackenzie

Explains why averages and associations can mislead, providing the conceptual foundation behind Simpson's paradox.

eBay marketplace picks

Marketplace Samples

Live-tested eBay searches with available results related to this page.

Using USA

Endnotes

  1. Source: jstor.org
    Title: The Interpretation of Interaction in Contingency Tables
    Link: https://www.jstor.org/stable/2984065
    Source snippet

    The Interpretation of Interaction in Contingency Tables - jstorJune 26, 1951 — In general it is impossible to summarize the relation...

    Published: June 26, 1951

  2. Source: Wikipedia
    Title: Simpson’s paradox
    Link: https://en.wikipedia.org/wiki/Simpson%27s_paradox

  3. Source: jstor.org
    Title: Interactions in Multi-Factor Contingency Tables
    Link: https://www.jstor.org/stable/2983765

  4. Source: jstor.org
    Link: https://www.jstor.org/stable/24591584

  5. Source: science.org
    Link: https://www.science.org/doi/10.1126/science.187.4175.398
    Source snippet

    Sex Bias in Graduate Admissions: Data from Berkeleyby PJ Bickel · 1975 · Cited by 975 — Sex Bias in Graduate Admissions: Data from...

  6. Source: pubmed.ncbi.nlm.nih.gov
    Link: https://pubmed.ncbi.nlm.nih.gov/17835295/
    Source snippet

    Sex bias in graduate admissions: data from berkeleyby PJ Bickel · 1975 · Cited by 975 — Women are shunted by their socialization an...

  7. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC8870532/

  8. Source: pubmed.ncbi.nlm.nih.gov
    Link: https://pubmed.ncbi.nlm.nih.gov/29484824/

  9. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC2266743/

  10. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC9960320/

  11. Source: philosophy.hku.hk
    Link: https://philosophy.hku.hk/think/stat/simpson.php

  12. Source: discovery.cs.illinois.edu
    Link: https://discovery.cs.illinois.edu/dataset/berkeley/

Additional References

  1. Source: aeon.co
    Title: how a statistical paradox helps to get to the root of bias in college admissions
    Link: https://aeon.co/videos/how-a-statistical-paradox-helps-to-get-to-the-root-of-bias-in-college-admissions
    Source snippet

    How a statistical paradox helps to get to the root of bias in college admissions | Aeon VideosBerkeley in 1973, when roughly 44 per c...

  2. Source: brookings.edu
    Title: when average isnt [good enough]({{ ‘good-enough/’ | relative_url }}) simpsons paradox in education and earnings
    Link: https://www.brookings.edu/articles/when-average-isnt-good-enough-simpsons-paradox-in-education-and-earnings/
    Source snippet

    When average isn't good enough: Simpson's paradox in...29 Jul 2015 — In the Berkeley case, the “paradox” occurred because women...

  3. Source: youtu.be
    Link: https://youtu.be/FDF_Q3_98GQ
    Source snippet

    "Overfitting vs Underfitting: [https://youtu.be/B9rhzg6_LLw](https://youtu.be/B9rhzg6_LLw) Why Models Overfit and Underfit - The Bias Variance Trade-off: [https://youtu.be..."](https://youtu.be...")...

  4. Source: youtu.be
    Link: https://youtu.be/N_RQj4OL1mg
    Source snippet

    Cross-Entropy - Explained: [https://youtu.be/Fv98vtitmiA](https://youtu.be/Fv98vtitmiA) Dropout - Explained: [https://youtu.be/FDF_Q3_98GQ](https://youtu.be/FDF_Q3_98GQ) Overfitting vs Underfitting: ht...

  5. Source: youtu.be
    Link: https://youtu.be/5mbX6ITznHk
    Source snippet

    "Least Squares vs Maximum Likelihood: [https://youtu.be/WCP98USBZ0w](https://youtu.be/WCP98USBZ0w) XGBoost Explained in Under 3 Minutes: [https://youtu.be/33fGfuleXw0..."](https://youtu.be/33fGfuleXw0...")...

  6. Source: youtu.be
    Link: https://youtu.be/G-fXV-o9QV8
    Source snippet

    "The Kernel Trick: [https://youtu.be/N_RQj4OL1mg](https://youtu.be/N_RQj4OL1mg) Cross-Entropy - Explained: [https://youtu.be/Fv98vtitmiA](https://youtu.be/Fv98vtitmiA) Dropout - Explained: [https://youtu..."](https://youtu...")...

  7. Source: youtu.be
    Link: https://youtu.be/B9rhzg6_LLw
    Source snippet

    "Why Models Overfit and Underfit - The Bias Variance Trade-off: [https://youtu.be/5mbX6ITznHk](https://youtu.be/5mbX6ITznHk) Least Squares vs Maximum Likelihood: [https://..."](https://...")...

  8. Source: youtu.be
    Link: https://youtu.be/Fv98vtitmiA
    Source snippet

    Dropout - Explained: [https://youtu.be/FDF_Q3_98GQ](https://youtu.be/FDF_Q3_98GQ) Overfitting vs Underfitting: [https://youtu.be/B9rhzg6_LLw](https://youtu.be/B9rhzg6_LLw) Why Models Overfit and Underf...

  9. Source: youtu.be
    Title: Why Models Overfit and Underfit
    Link: https://youtu.be/WCP98USBZ0w
    Source snippet

    "XGBoost Explained in Under 3 Minutes: [https://youtu.be/33fGfuleXw0..."](https://youtu.be/33fGfuleXw0...")...

  10. Source: youtu.be
    Link: https://youtu.be/Kq6_kzlwSUQ
    Source snippet

    Hyperparameters Tuning: Grid Search vs Random Search: [https://youtu.be/G-fXV-o9QV8](https://youtu.be/G-fXV-o9QV8) The Kernel Trick: [https://youtu.be/N_RQj4OL1mg](https://youtu.be/N_RQj4OL1mg) Cross-E...

Topic Tree

Follow this branch

Parent topic

Cause Check Did It Cause It, or Just Happen Nearby?

Related pages 5