Why the Chart Can Beat the Average

Introduction

Anscombe’s quartet is a small but powerful warning about correlation in messy outcomes: the same averages, variances, correlation coefficient and regression line can describe datasets with very different shapes. Francis Anscombe introduced the quartet in 1973 to argue that graphs are not decorative extras; they are part of serious statistical analysis. His four datasets each contain eleven x-y pairs and look almost identical in summary form, yet their scatter plots reveal four different stories: a roughly linear relationship, a curved relationship, a line distorted by an outlier, and an apparent correlation created by a single high-leverage point. [JSTOR]jstor.orgGraphs in Statistical Analysis on JSTORJSTOR Graphs in Statistical Analysis on JSTOR…

Data Shapes illustration 1 The lesson for causal thinking is direct. A correlation may invite a causal question, but the chart shows whether the evidence is linear, curved, fragile, clustered or dominated by one observation. Before interpreting a relationship as meaningful, it is worth asking: what shape produced this number?

Four Datasets, One Set of Summaries

Anscombe’s quartet is deliberately unsettling because the numerical summaries agree so closely. The R statistical software documentation describes it as four x-y datasets with the same traditional statistical properties, including mean, variance, correlation and regression line, while being “quite different” when examined as data. [Seminar for Statistics]stat.ethz.chOpen source on ethz.ch.

The shared summaries include:

Mean of x: 9.
Mean of y: about 7.5.
Variance of x: 11.
Similar variance of y.
Correlation: about 0.816.
Regression line: approximately y = 3 + 0.5x.

If those figures were presented in a report, the tidy interpretation would be tempting: x and y have a moderately strong positive linear relationship. But the four plots reject that one-size-fits-all story.

In the first dataset, the summary is broadly honest. The points form a loose upward-sloping cloud, so a straight-line fit is a reasonable simplification. In the second, the points follow a curve, so the same linear summary hides a non-linear relationship. In the third, most points sit close to a line, but one unusual point pulls the regression line away from the pattern. In the fourth, nearly all x-values are the same, and one distant point creates the apparent relationship. [eagereyes]eagereyes.orgAnscombe's Quartet – eagereyesAnscombe's Quartet – eagereyes…

Where the Average Misleads

The quartet matters because each dataset fails in a different way. It is not just “graphs are nice”; it is that different visual shapes imply different analytical mistakes.

Averages flatten the data. Correlation compresses a relationship into one number. Regression draws the best-fitting straight line whether or not a straight line is a sensible description. Anscombe’s examples show how those summaries can be technically correct and still misleading.

The second dataset is a warning about curves. A strong relationship may exist, but not in the form assumed by a linear model. In real analysis, that could mean a treatment helps up to a threshold and then levels off, a price cut works only after a certain point, or performance improves quickly at first and then slows. A simple correlation can miss the shape that matters.

The third dataset is a warning about outliers. One observation can distort the estimated relationship. That does not automatically mean the point should be deleted: it may be a measurement error, a special case, or the most important observation in the dataset. The visual check tells the analyst that the causal story depends on understanding that point, not merely reporting the fitted line.

The fourth dataset is a warning about leverage points. A leverage point is unusual in its x-value, so it can strongly affect the slope of a regression line. In Anscombe’s fourth dataset, the appearance of a relationship is driven by one distant x-value while the rest of the data provide little evidence of a general trend. That is especially dangerous in causal interpretation, because one extreme case can make a broad claim look numerically supported.

Data Shapes illustration 2

Visual Checks Before Causal Interpretation

A scatter plot should not be treated as proof of causation. It does something more basic: it tells you what kind of evidence you are looking at. The United States National Institute of Standards and Technology describes exploratory data analysis as a set of techniques, many graphical, for gaining insight into data, testing assumptions and uncovering structure. [NIST]itl.nist.gov1. Exploratory Data Analysis1. Exploratory Data Analysis… A medical research methods chapter hosted by the US National Library of Medicine similarly describes exploratory analysis as a way to examine distributions, outliers and anomalies before formal testing. [NCBI]ncbi.nlm.nih.govNCBIExploratory Data AnalysisSecondary Analysis of Electronic Health Records - NCBI Bookshelf…

For cause-and-correlation reasoning, the useful habit is to inspect the relationship before explaining it. A visual check should ask:

Does the relationship look roughly linear, or is it curved?
Are there clusters that suggest different subgroups are being mixed?
Is the pattern driven by one or two unusual observations?
Are there gaps, ceiling effects or repeated values that make the summary statistic fragile?
Would the apparent relationship survive if the most influential point were examined separately?

These questions do not replace causal design, fair comparison or domain knowledge. They protect those steps from starting with a false picture of the evidence.

Why the Lesson Still Holds

Anscombe’s quartet has remained influential because later examples have extended the same warning. The “Datasaurus Dozen”, created by Justin Matejka and George Fitzmaurice, showed that many dramatically different shapes can share the same means, standard deviations and correlations to two decimal places. Their method moved points gradually while preserving selected summary statistics, producing datasets that looked like very different images while retaining the same numerical summaries. [Autodesk Research]research.autodesk.comAutodesk ResearchSame Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Anne…

That modern extension strengthens Anscombe’s original point. Summary statistics are useful compression, not full evidence. They can tell you where to look, but not always what you are seeing. In messy real-world outcomes, the shape of the data often decides whether a correlation is a clue, a modelling artefact, an outlier problem or a genuinely plausible relationship worth deeper causal investigation.

Data Shapes illustration 3

Amazon book picks

Marketplace Samples

Live-tested eBay searches with available results related to this page.

Example eBay listing

Data Visualization V Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: data visualization wall art

Browse similar on eBay.co.uk

Example eBay listing

Data Visualization VII Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: data visualization wall art

Browse similar on eBay.co.uk

Example eBay listing

Data Visualization VI Framed Wall Art Poster Canvas Print Picture

Search eBay.co.uk: data visualization wall art

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: jstor.org
Title: Graphs in Statistical Analysis on JSTOR
Link: https://www.jstor.org/stable/2682899
Source snippet
JSTOR Graphs in Statistical Analysis on JSTOR...
Source: eagereyes.org
Title: Anscombe’s Quartet – eagereyes
Link: https://eagereyes.org/criticism/anscombes-quartet
Source snippet
Anscombe's Quartet – eagereyes...
Source: itl.nist.gov
Title: 1. Exploratory Data Analysis
Link: https://www.itl.nist.gov/div898/handbook/eda/eda_d.htm
Source snippet
1. Exploratory Data Analysis...
Source: ncbi.nlm.nih.gov
Title: NCBIExploratory Data Analysis
Link: https://www.ncbi.nlm.nih.gov/books/NBK543641/
Source snippet
Secondary Analysis of Electronic Health Records - NCBI Bookshelf...
Source: research.autodesk.com
Link: https://www.research.autodesk.com/publications/same-stats-different-graphs/
Source snippet
Autodesk ResearchSame Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Anne...
Source: itl.nist.gov
Title: E Handbook
Link: https://www.itl.nist.gov/div898/handbook/toolaids/pff/E-Handbook.pdf
Source: itl.nist.gov
Link: https://www.itl.nist.gov/div898/handbook/dtoc.htm
Source: research.autodesk.com
Title: same stats different graphs.pdf rec2h Rj LLGg M7Cn2T
Link: https://www.research.autodesk.com/app/uploads/2023/03/same-stats-different-graphs.pdf_rec2hRjLLGgM7Cn2T.pdf
Source: youtube.com
Title: Anscombe’s Quartet
Link: https://www.youtube.com/watch?v=MzNYJ_K3KC8
Source snippet
Correlation and Regression - Why Visualization Matters (Anscombe)...
Source: youtube.com
Title: Correlation and Regression
Link: https://www.youtube.com/watch?v=d0aaCxS9Avs
Source snippet
The strange case of Anscombe's quartet...
Source: stat.ethz.ch
Link: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/anscombe.html
Source: data.europa.eu
Link: https://data.europa.eu/apps/data-visualisation-guide/correlations
Source: r-causal.github.io
Title: datasaurus dozen
Link: https://r-causal.github.io/quartets/reference/datasaurus_dozen.html
Source: grodri.github.io
Link: https://grodri.github.io/glms/stata/anscombe
Source: devopedia.org
Title: exploratory data analysis
Link: https://devopedia.org/exploratory-data-analysis

Additional References

Source: youtube.com
Title: The strange case of Anscombe’s quartet
Link: https://www.youtube.com/watch?v=Kd–Q-aTwpM
Source snippet
The video What is Anscombe's quartet? illustrates how summary statistics can be highly misleading without visualization, demonstrating ho...
Source: tellingstorieswithdata.com
Link: https://tellingstorieswithdata.com/11-eda.html
Source: researchgate.net
Link: https://www.researchgate.net/publication/316652618_Same_Stats_Different_Graphs_Generating_Datasets_with_Varied_Appearance_and_Identical_Statistics_through_Simulated_Annealing
Source: scribd.com
Link: https://www.scribd.com/document/822845260/Anscombes-Quartet
Source: scribd.com
Link: https://www.scribd.com/doc/219802461/e-Handbook-of-Statistical-Methods-NIST-SEMATECH
Source: reddit.com
Link: https://www.reddit.com/r/dataisbeautiful/comments/axx3kb/example_datasets_with_the_same_statistical/
Source: researchgate.net
Link: https://www.researchgate.net/publication/308007227_Exploratory_Data_Analysis/fulltext/57d6aa3908ae5f03b494b2e3/Exploratory-Data-Analysis.pdf
Source: roger-beecham.com
Link: https://www.roger-beecham.com/comp-sds/class/04-class/
Source: linkedin.com
Link: https://www.linkedin.com/pulse/francis-anscombes-quartet-sedar-sahin-eddhe
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Same-Stats%2C-Different-Graphs%3A-Generating-Datasets-Matejka-Fitzmaurice/7319234cc18a3fcaf55b57400e25f9f3114313bb

Why the Chart Can Beat the Average

Introduction

Four Datasets, One Set of Summaries

Where the Average Misleads

Visual Checks Before Causal Interpretation

Why the Lesson Still Holds

Further Reading

The Art of Statistics

How to Lie with Statistics

The Signal and the Noise

Naked Statistics

Marketplace Samples

Data Visualization V Framed Wall Art Poster Canvas Print Picture

Data Visualization VII Framed Wall Art Poster Canvas Print Picture

Data Visualization VI Framed Wall Art Poster Canvas Print Picture

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 5