What Brier Scores Can And Cannot Tell You

Introduction

A Brier score is one of the simplest ways to turn confidence into measurable feedback. Instead of asking only whether a prediction was right or wrong, it asks whether the probability you assigned matched what eventually happened. If you repeatedly say an event has a 70% chance and those events occur about 70% of the time, your forecasts will generally earn better Brier scores than someone who confidently assigns 95% to many uncertain events.

Brier Scores illustration 1 For anyone trying to improve their thinking and analytical skills, the value of a Brier score is not that it produces a single number to compete over. Its value is that it exposes whether your confidence is consistently too high, too low, or appropriately matched to reality. At the same time, the score has important limits. Lower scores are not always evidence of better judgement if different people answered questions of very different difficulty, and a single score reveals little about why a forecaster performed well or badly. [scores.readthedocs.io+2Royal Meteorological Society]scores.readthedocs.ioBrier score — scores 1.3.0 documentationThe Brier score is a strictly proper scoring rule where lower values are better (it is negatively…

How a Brier score grades a probability

For a binary prediction—such as “this project will finish this month”—the Brier score measures the squared difference between the probability you assigned and the outcome that occurred.

If the event happens, the error is:

(1 − predicted probability)²

If the event does not happen, the error is:

(0 − predicted probability)²

The scores are then averaged across many predictions. Lower scores are better, with 0 representing perfect forecasting. Because errors are squared, confident mistakes are punished much more heavily than cautious ones. Predicting a 99% chance of success when something fails produces a far worse score than predicting a 60% chance. [scores.readthedocs.io]scores.readthedocs.ioBrier score — scores 1.3.0 documentationThe Brier score is a strictly proper scoring rule where lower values are better (it is negatively…

A few simple examples illustrate the idea:

ForecastOutcomeBrier score90%Event occurs0.0190%Event does not occur0.8160%Event occurs0.1650%Event occurs0.25

These examples highlight the central lesson for confidence calibration: certainty should be earned. The scoring rule rewards accurate confidence rather than boldness for its own sake.

An important theoretical property is that the Brier score is a strictly proper scoring rule. This means the forecaster’s best long-run strategy is to report their genuine probability estimate rather than exaggerating confidence or hedging towards 50%. That property makes it especially suitable as a learning tool rather than merely a performance metric. [arXiv]arxiv.orgarXiv Calibration Scoring Rules for Practical Prediction TrainingCalibration Scoring Rules for Practical Prediction TrainingAugust 22, 2018…Published: August 22, 2018

Why lower scores are not the whole story

A common mistake is treating a lower Brier score as proof that one person is a better thinker than another. The comparison is only meaningful when they forecast roughly comparable questions.

Question difficulty matters

Forecasts on easy questions naturally receive lower Brier scores than forecasts on genuinely uncertain ones. Someone predicting obvious sporting favourites or well-established trends may outperform someone making careful predictions about geopolitical events, even if the second forecaster is better calibrated.

Researchers studying forecasting tournaments have therefore explored methods that adjust for question difficulty or compare forecasters only on shared questions. Without such adjustments, raw Brier scores can mix together forecasting skill and task difficulty. [ForecastBench]forecastbench.orgUPDATED RANKING METHODOL- OGYby S Kucinskas — 5.1 EXCLUDING STALE MODELS FROM DIFFICULTY ESTIMATION. The difficulty-adjusted…

For personal feedback, this means you should compare:

your own scores over time on similar kinds of questions;
scores across forecasters answering the same questions; or
scores after accounting for question difficulty where possible.

Brier Scores illustration 2

The score mixes several qualities together

Another limitation is that the Brier score combines multiple aspects of forecasting into one number.

Murphy’s classic decomposition separates the overall score into three components: [insightful-data-lab.com]insightful-data-lab.comMurphy's DecompositionYour Gateway to Data Mastery21 Aug 2025 — Murphy's decomposition = a way to break down forecast error into calibration (reliability), sha…

Reliability (calibration): whether predicted probabilities match observed frequencies.
Resolution: whether the forecaster successfully distinguishes situations that genuinely have different chances of occurring.
Uncertainty: the inherent unpredictability of the events themselves. [Royal Meteorological Society]rmets.onlinelibrary.wiley.comRoyal Meteorological SocietySimplifying and generalising Murphy's Brier score…by S Siegert · 2017 · Cited by 45 — The decomposition of…

This decomposition explains why two forecasters can have identical overall scores for different reasons. One might be well calibrated but reluctant to move away from 50%, while another makes informative forecasts but is slightly overconfident. Looking only at the average score hides those differences.

Even perfect models do not always score zero

Another misconception is that a perfectly informed forecasting process should eventually achieve a Brier score of zero. In reality, many events remain genuinely probabilistic. Even if your stated probabilities exactly match the true underlying risks, random outcomes still produce non-zero average scores.

Recent methodological work emphasises that the Brier score reflects not only forecasting quality but also the underlying distribution of risks and unavoidable randomness in the events being predicted. [PMC]pmc.ncbi.nlm.nih.govOn misconceptions about the Brier score in binary prediction…by L Hoessly · 2026 · Cited by 19 — The Brier score was shown to refle…

Using scores as feedback without gaming them

The greatest value of the Brier score comes from repeated practice rather than from chasing a particular number.

A productive feedback cycle looks like this:

Record explicit probability forecasts before outcomes are known.
Calculate your average Brier score over dozens or hundreds of predictions. [emergentmind.com]emergentmind.combrier score termBrier Score: Calibration, Resolution, and UncertaintyJul 25, 2025 — The Brier score term evaluates probabilistic forecasts by decomposing…
Review your largest penalties, especially forecasts above about 90% confidence that proved wrong.
Examine whether the problem was missing evidence, overconfidence, or poor judgement about uncertainty.
Adjust future probability estimates rather than merely trying to minimise the score.

This process encourages honest self-calibration. The objective is not to avoid strong predictions altogether but to reserve high confidence for situations where the evidence genuinely supports it.

One useful habit is to review forecasts by confidence bands—for example, all predictions made between 70% and 80% confidence—and ask whether roughly three quarters actually occurred. That complements the Brier score by revealing systematic overconfidence or underconfidence that a single average can conceal. [Royal Meteorological Society]rmets.onlinelibrary.wiley.comRoyal Meteorological SocietySimplifying and generalising Murphy's Brier score…by S Siegert · 2017 · Cited by 45 — The decomposition of…

Trying to “game” the score by predicting 50% for everything is also a poor strategy. While cautious forecasts avoid catastrophic penalties, they fail to distinguish likely from unlikely events, reducing forecast resolution. Proper scoring rules reward forecasters who are both well calibrated and willing to express justified confidence when evidence supports it. [Royal Meteorological Society]rmets.onlinelibrary.wiley.comRoyal Meteorological SocietySimplifying and generalising Murphy's Brier score…by S Siegert · 2017 · Cited by 45 — The decomposition of…

Brier Scores illustration 3

What Brier scores are best used for

For improving analytical judgement, Brier scores are most valuable as a personal feedback mechanism rather than as a universal ranking system.

They work well when:

you make many probability forecasts over time;
outcomes are clearly observable;
comparable questions are evaluated together;
scores are interpreted alongside calibration and confidence patterns rather than in isolation.

Used this way, the Brier score turns vague feelings of confidence into measurable evidence. It cannot tell you everything about judgement, and it should always be interpreted in light of question difficulty and the structure of the forecasting task. Nevertheless, it remains one of the most practical tools for learning whether your expressed confidence actually matches reality.

Amazon book picks

Marketplace Samples

Topic-anchored marketplace searches for visual, collectible, or second-hand items related to this page.

Example eBay listing

1950s TREBLE CHANCE DICE GAME ARIEL PRODUCTIONS + BOX RULES BAKELITE DICE

Search eBay.co.uk: probability dice

Browse similar on eBay.co.uk

Example eBay listing

Paul J. Nahin 2 x Probability: Duelling Idiots+Digital Dice and (2014, PBK)

Search eBay.co.uk: probability dice

Browse similar on eBay.co.uk

Example eBay listing

Sintrres Dice Game | Strategy & Chance | Push Your Luck Game | 2+ Players

Search eBay.co.uk: probability dice

Browse similar on eBay.co.uk

Example eBay listing

SimplyFun My Mine Dice Game Strategy Probability

Search eBay.co.uk: probability dice

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: scores.readthedocs.io
Link: https://scores.readthedocs.io/en/1.3.0/tutorials/Brier_Score.html
Source snippet
Brier score — scores 1.3.0 documentationThe Brier score is a strictly proper scoring rule where lower values are better (it is negatively...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7333631/
Source snippet
Superforecasting reality check: Evidence from a small pool of...by I Katsagounos · 2020 · Cited by 23 — The Brier Score measures the...
Source: arxiv.org
Title: arXiv Calibration Scoring Rules for Practical Prediction Training
Link: https://arxiv.org/abs/1808.07501
Source snippet
Calibration Scoring Rules for Practical Prediction TrainingAugust 22, 2018...

Published: August 22, 2018
Source: forecastbench.org
Link: https://www.forecastbench.org/assets/pdfs/forecastbench_updated_methodology.pdf
Source snippet
UPDATED RANKING METHODOL- OGYby S Kucinskas — 5.1 EXCLUDING STALE MODELS FROM DIFFICULTY ESTIMATION. The difficulty-adjusted...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12818272/
Source snippet
On misconceptions about the Brier score in binary prediction...by L Hoessly · 2026 · Cited by 19 — The Brier score was shown to refle...
Source: arxiv.org
Link: https://arxiv.org/html/2507.04562v3
Source snippet
Evaluating LLMs on Real-World Forecasting Against...4 Aug 2025 — A Brier score of 0 represents perfect accuracy, and a Brier score of 1...
Source: arxiv.org
Link: https://arxiv.org/pdf/2005.01835
Source snippet
more...
Source: arxiv.org
Link: https://arxiv.org/pdf/0806.0813
Source snippet
Scoring rules are an important tool for evaluating the performance of probabilistic forecasting schemes. In the binary case...Read more...
Source: arxiv.org
Link: https://arxiv.org/pdf/2003.12447
Source snippet
We evaluate the Brier score distribution of aggregated forecasts.Read more...
Source: scores.readthedocs.io
Link: https://scores.readthedocs.io/en/1.1.0/tutorials/Brier_Score.html
Source snippet
Brier score — scores 1.1.0 documentationThe Brier score is a strictly proper scoring rule where lower values are better (it is negatively...
Source: rmets.onlinelibrary.wiley.com
Link: https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.2985
Source snippet
Royal Meteorological SocietySimplifying and generalising Murphy's Brier score...by S Siegert · 2017 · Cited by 45 — The decomposition of...
Source: Wikipedia
Title: The Brier
Link: https://en.wikipedia.org/wiki/The_Brier
Source snippet
The BrierThe Brier (French: Le Brier), known since 2024 as the Montana's Brier for sponsorship reasons, is the annual Canadian men's c...
Source: Wikipedia
Title: Scoring rule
Link: https://en.wikipedia.org/wiki/Scoring_rule
Source snippet
Scoring ruleA calibration curve allows to judge how well model predictions are calibrated, by comparing the predicted quantiles to the...
Source: Wikipedia
Title: Brier score
Link: https://en.wikipedia.org/wiki/Brier_score
Source snippet
Brier score2.1 Three-component decomposition. 2.1.1 Reliability; 2.1.2 Resolution; 2.1.3 Uncertainty · 2.2 Two-component decomposition...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10189590/
Source snippet
improves forecasting - PMC - NIHby DN Ferreiro · 2023 · Cited by 3 — The Brier score of such 'compromise between compromises', was define...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12523994/
Source snippet
Brier Score - an Overall Summary Measure for Risk...by K Zhu · 2025 · Cited by 13 — The Brier score, mean squared error of predicted ris...
Source: nea.gov.sg
Link: https://www.nea.gov.sg/corporate-functions/weather
Source snippet
20 Jun 2024 — Weather · 24-hour weather forecast · 4-day Outlook · Warnings & Advisories · 24-hour Forecast · Rain Areas · UV Index · Exp...
Source: insightful-data-lab.com
Title: Murphy’s Decomposition
Link: https://insightful-data-lab.com/2025/08/21/murphys-decomposition/
Source snippet
Your Gateway to Data Mastery21 Aug 2025 — Murphy's decomposition = a way to break down forecast error into calibration (reliability), sha...
Source: rmets.onlinelibrary.wiley.com
Link: https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.2985
Source snippet
and generalising Murphy's Brier score...15 Dec 2016 — The decomposition of the Brier score into Reliability, Resolution and Uncertainty...
Source: onlinelibrary.wiley.com
Link: https://onlinelibrary.wiley.com/doi/10.1111/risa.12776
Source snippet
of Superforecasting: The Art and Science...16 Mar 2017 — Technical concepts of calibration, resolution, and Brier scores are explained c...
Source: rmets.onlinelibrary.wiley.com
Link: https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.4478
Source snippet
conditional decomposition of proper scores: quantifying the...24 Apr 2023 — This decomposition does not rely on the scoring rule being p...
Source: emergentmind.com
Title: brier score term
Link: https://www.emergentmind.com/topics/brier-score-term
Source snippet
Brier Score: Calibration, Resolution, and UncertaintyJul 25, 2025 — The Brier score term evaluates probabilistic forecasts by decomposing...
Source: emergentmind.com
Title: brier score
Link: https://www.emergentmind.com/topics/brier-score
Source snippet
Murphy's classical decomposition expresses the Brier score as the sum of three terms:...Read more...
Source: entropicthoughts.com
Title: brier score
Link: https://entropicthoughts.com/brier-score
Source snippet
4 Feb 2025 — A Brier score worse than 0.5 usually indicates sampling error, but a good forecaster could hypothetically grief an evaluator...

Additional References

Source: researchgate.net
Link: https://www.researchgate.net/publication/277087515_Identifying_and_Cultivating_Superforecasters_as_a_Method_of_Improving_Probabilistic_Predictions
Source snippet
(PDF) Identifying and Cultivating Superforecasters as a...25 May 2015 — In this article, we describe the winning strategy: culling off t...

Published: May 2015
Source: researchgate.net
Link: https://www.researchgate.net/publication/253893961_Two_Extra_Components_in_the_Brier_Score_Decomposition
Source: cambridge.org
Title: recalibrating probabilistic forecasts to improve theiraccuracy
Link: https://www.cambridge.org/core/services/aop-cambridge-core/content/view/5D8F0DFC90FC1E9B5C38C12992A6C6A7/S1930297500009049a.pdf/recalibrating_probabilistic_forecasts_to_improve_theiraccuracy.pdf
Source snippet
Recalibrating probabilistic forecasts to improve their accuracyby Y Han · 2022 · Cited by 12 — The third panel shows the Brier scores bas...
Source: medium.com
Link: https://medium.com/%40eligoz/some-notes-on-probabilistic-classifiers-iii-brier-score-decomposition-eee5f847d87f
Source snippet
), Uncertainty(UNC), Resolution (RES), Within-bin-variance(WBV), Within-bin-...Read more...
Source: researchgate.net
Title: Reliability, Sufficiency, and the Decomposition of Proper
Link: https://www.researchgate.net/publication/227532520_Reliability_Sufficiency_and_the_Decomposition_of_Proper_Scores
Source snippet
Associated entropies and divergences are used in the calibration-sharpness decomposition introduced by Bröcker [4] for proper scores of c...
Source: forum.effectivealtruism.org
Link: https://forum.effectivealtruism.org/posts/pnpnqA4hijnr59p7d/efforts-to-improve-the-accuracy-of-our-judgments-and
Source snippet
to Improve the Accuracy of Our Judgments and...25 Oct 2016 — Tetlock & Gardner (2015) explain this point in the context of assessing for...
Source: medium.com
Link: https://medium.com/%40eligoz/some-notes-on-probabilistic-classifiers-ii-scoring-rules-and-the-brier-score-16ffc1ba063b
Source snippet
any probabilistic forecast which is not exactly calibrated with the...Read more...
Source: youtube.com
Title: A Guide to Model Calibration | Calibration Plots | Brier Score
Link: https://www.youtube.com/watch?v=rG2EfFOXyg0
Source snippet
Improve Your Decision-Making with Your Own Brier Score is highly relevant because it explains how to practically track confidence levels...
Source: entropicthoughts.com
Title: improving forecasting accuracy benchmark
Link: https://entropicthoughts.com/improving-forecasting-accuracy-benchmark.html
Source snippet
Forecasting Accuracy7 Jul 2023 — The baseline Brier score on a 30-day forecast is 0.21 (in other words, the study subjects are somewhat b...
Source: commoncog.com
Title: how do you evaluate your own predictions
Link: https://commoncog.com/how-do-you-evaluate-your-own-predictions/
Source snippet
?17 Dec 2019 — This post provides a comprehensive summary of the technique that Tetlock and Gardner presents in Superforecasting.Read more...

What Brier Scores Can And Cannot Tell You

Introduction

How a Brier score grades a probability

Why lower scores are not the whole story

Question difficulty matters

The score mixes several qualities together

Even perfect models do not always score zero

Using scores as feedback without gaming them

What Brier scores are best used for

Further Reading

Superforecasting

How to Measure Anything

The Signal and the Noise

Thinking, Fast and Slow

Marketplace Samples

1950s TREBLE CHANCE DICE GAME ARIEL PRODUCTIONS + BOX RULES BAKELITE DICE

Paul J. Nahin 2 x Probability: Duelling Idiots+Digital Dice and (2014, PBK)

Sintrres Dice Game | Strategy & Chance | Push Your Luck Game | 2+ Players

SimplyFun My Mine Dice Game Strategy Probability

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 5