Within Calibration

What Brier Scores Can And Cannot Tell You

Brier scores can turn probability forecasts into feedback, but they should be read with the difficulty of the questions in mind.

On this page

  • How a Brier score grades a probability
  • Why lower scores are not the whole story
  • Using scores as feedback without gaming them
Preview for What Brier Scores Can And Cannot Tell You

Introduction

A Brier score is one of the simplest ways to turn confidence into measurable feedback. Instead of asking only whether a prediction was right or wrong, it asks whether the probability you assigned matched what eventually happened. If you repeatedly say an event has a 70% chance and those events occur about 70% of the time, your forecasts will generally earn better Brier scores than someone who confidently assigns 95% to many uncertain events.

Brier Scores illustration 1 For anyone trying to improve their thinking and analytical skills, the value of a Brier score is not that it produces a single number to compete over. Its value is that it exposes whether your confidence is consistently too high, too low, or appropriately matched to reality. At the same time, the score has important limits. Lower scores are not always evidence of better judgement if different people answered questions of very different difficulty, and a single score reveals little about why a forecaster performed well or badly. [scores.readthedocs.io+2Royal Meteorological Society]scores.readthedocs.ioBrier score — scores 1.3.0 documentationThe Brier score is a strictly proper scoring rule where lower values are better (it is negatively…

How a Brier score grades a probability

For a binary prediction—such as “this project will finish this month”—the Brier score measures the squared difference between the probability you assigned and the outcome that occurred.

If the event happens, the error is:

  • (1 − predicted probability)²

If the event does not happen, the error is:

  • (0 − predicted probability)²

The scores are then averaged across many predictions. Lower scores are better, with 0 representing perfect forecasting. Because errors are squared, confident mistakes are punished much more heavily than cautious ones. Predicting a 99% chance of success when something fails produces a far worse score than predicting a 60% chance. [scores.readthedocs.io]scores.readthedocs.ioBrier score — scores 1.3.0 documentationThe Brier score is a strictly proper scoring rule where lower values are better (it is negatively…

A few simple examples illustrate the idea:

ForecastOutcomeBrier score90%Event occurs0.0190%Event does not occur0.8160%Event occurs0.1650%Event occurs0.25

These examples highlight the central lesson for confidence calibration: certainty should be earned. The scoring rule rewards accurate confidence rather than boldness for its own sake.

An important theoretical property is that the Brier score is a strictly proper scoring rule. This means the forecaster’s best long-run strategy is to report their genuine probability estimate rather than exaggerating confidence or hedging towards 50%. That property makes it especially suitable as a learning tool rather than merely a performance metric. [arXiv]arxiv.orgarXiv Calibration Scoring Rules for Practical Prediction TrainingCalibration Scoring Rules for Practical Prediction TrainingAugust 22, 2018…Published: August 22, 2018

Why lower scores are not the whole story

A common mistake is treating a lower Brier score as proof that one person is a better thinker than another. The comparison is only meaningful when they forecast roughly comparable questions.

Question difficulty matters

Forecasts on easy questions naturally receive lower Brier scores than forecasts on genuinely uncertain ones. Someone predicting obvious sporting favourites or well-established trends may outperform someone making careful predictions about geopolitical events, even if the second forecaster is better calibrated.

Researchers studying forecasting tournaments have therefore explored methods that adjust for question difficulty or compare forecasters only on shared questions. Without such adjustments, raw Brier scores can mix together forecasting skill and task difficulty. [ForecastBench]forecastbench.orgUPDATED RANKING METHODOL- OGYby S Kucinskas — 5.1 EXCLUDING STALE MODELS FROM DIFFICULTY ESTIMATION. The difficulty-adjusted…

For personal feedback, this means you should compare:

  • your own scores over time on similar kinds of questions;
  • scores across forecasters answering the same questions; or
  • scores after accounting for question difficulty where possible.

Brier Scores illustration 2

The score mixes several qualities together

Another limitation is that the Brier score combines multiple aspects of forecasting into one number.

Murphy’s classic decomposition separates the overall score into three components: [insightful-data-lab.com]insightful-data-lab.comMurphy's DecompositionYour Gateway to Data Mastery21 Aug 2025 — Murphy's decomposition = a way to break down forecast error into calibration (reliability), sha…

  • Reliability (calibration): whether predicted probabilities match observed frequencies.
  • Resolution: whether the forecaster successfully distinguishes situations that genuinely have different chances of occurring.
  • Uncertainty: the inherent unpredictability of the events themselves. [Royal Meteorological Society]rmets.onlinelibrary.wiley.comRoyal Meteorological SocietySimplifying and generalising Murphy's Brier score…by S Siegert · 2017 · Cited by 45 — The decomposition of…

This decomposition explains why two forecasters can have identical overall scores for different reasons. One might be well calibrated but reluctant to move away from 50%, while another makes informative forecasts but is slightly overconfident. Looking only at the average score hides those differences.

Even perfect models do not always score zero

Another misconception is that a perfectly informed forecasting process should eventually achieve a Brier score of zero. In reality, many events remain genuinely probabilistic. Even if your stated probabilities exactly match the true underlying risks, random outcomes still produce non-zero average scores.

Recent methodological work emphasises that the Brier score reflects not only forecasting quality but also the underlying distribution of risks and unavoidable randomness in the events being predicted. [PMC]pmc.ncbi.nlm.nih.govOn misconceptions about the Brier score in binary prediction…by L Hoessly · 2026 · Cited by 19 — The Brier score was shown to refle…

Using scores as feedback without gaming them

The greatest value of the Brier score comes from repeated practice rather than from chasing a particular number.

A productive feedback cycle looks like this:

  1. Record explicit probability forecasts before outcomes are known.
  2. Calculate your average Brier score over dozens or hundreds of predictions. [emergentmind.com]emergentmind.combrier score termBrier Score: Calibration, Resolution, and UncertaintyJul 25, 2025 — The Brier score term evaluates probabilistic forecasts by decomposing…
  3. Review your largest penalties, especially forecasts above about 90% confidence that proved wrong.
  4. Examine whether the problem was missing evidence, overconfidence, or poor judgement about uncertainty.
  5. Adjust future probability estimates rather than merely trying to minimise the score.

This process encourages honest self-calibration. The objective is not to avoid strong predictions altogether but to reserve high confidence for situations where the evidence genuinely supports it.

One useful habit is to review forecasts by confidence bands—for example, all predictions made between 70% and 80% confidence—and ask whether roughly three quarters actually occurred. That complements the Brier score by revealing systematic overconfidence or underconfidence that a single average can conceal. [Royal Meteorological Society]rmets.onlinelibrary.wiley.comRoyal Meteorological SocietySimplifying and generalising Murphy's Brier score…by S Siegert · 2017 · Cited by 45 — The decomposition of…

Trying to “game” the score by predicting 50% for everything is also a poor strategy. While cautious forecasts avoid catastrophic penalties, they fail to distinguish likely from unlikely events, reducing forecast resolution. Proper scoring rules reward forecasters who are both well calibrated and willing to express justified confidence when evidence supports it. [Royal Meteorological Society]rmets.onlinelibrary.wiley.comRoyal Meteorological SocietySimplifying and generalising Murphy's Brier score…by S Siegert · 2017 · Cited by 45 — The decomposition of…

Brier Scores illustration 3

What Brier scores are best used for

For improving analytical judgement, Brier scores are most valuable as a personal feedback mechanism rather than as a universal ranking system.

They work well when:

  • you make many probability forecasts over time;
  • outcomes are clearly observable;
  • comparable questions are evaluated together;
  • scores are interpreted alongside calibration and confidence patterns rather than in isolation.

Used this way, the Brier score turns vague feelings of confidence into measurable evidence. It cannot tell you everything about judgement, and it should always be interpreted in light of question difficulty and the structure of the forecasting task. Nevertheless, it remains one of the most practical tools for learning whether your expressed confidence actually matches reality.

Amazon book picks

Further Reading

Books and field guides related to What Brier Scores Can And Cannot Tell You. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Topic-anchored marketplace searches for visual, collectible, or second-hand items related to this page.

Using USA

Endnotes

  1. Source: scores.readthedocs.io
    Link: https://scores.readthedocs.io/en/1.3.0/tutorials/Brier_Score.html
    Source snippet

    Brier score — scores 1.3.0 documentationThe Brier score is a strictly proper scoring rule where lower values are better (it is negatively...

  2. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7333631/
    Source snippet

    Superforecasting reality check: Evidence from a small pool of...by I Katsagounos · 2020 · Cited by 23 — The Brier Score measures the...

  3. Source: arxiv.org
    Title: arXiv Calibration Scoring Rules for Practical Prediction Training
    Link: https://arxiv.org/abs/1808.07501
    Source snippet

    Calibration Scoring Rules for Practical Prediction TrainingAugust 22, 2018...

    Published: August 22, 2018

  4. Source: forecastbench.org
    Link: https://www.forecastbench.org/assets/pdfs/forecastbench_updated_methodology.pdf
    Source snippet

    UPDATED RANKING METHODOL- OGYby S Kucinskas — 5.1 EXCLUDING STALE MODELS FROM DIFFICULTY ESTIMATION. The difficulty-adjusted...

  5. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12818272/
    Source snippet

    On misconceptions about the Brier score in binary prediction...by L Hoessly · 2026 · Cited by 19 — The Brier score was shown to refle...

  6. Source: arxiv.org
    Link: https://arxiv.org/html/2507.04562v3
    Source snippet

    Evaluating LLMs on Real-World Forecasting Against...4 Aug 2025 — A Brier score of 0 represents perfect accuracy, and a Brier score of 1...

  7. Source: arxiv.org
    Link: https://arxiv.org/pdf/2005.01835
    Source snippet

    more...

  8. Source: arxiv.org
    Link: https://arxiv.org/pdf/0806.0813
    Source snippet

    Scoring rules are an important tool for evaluating the performance of probabilistic forecasting schemes. In the binary case...Read more...

  9. Source: arxiv.org
    Link: https://arxiv.org/pdf/2003.12447
    Source snippet

    We evaluate the Brier score distribution of aggregated forecasts.Read more...

  10. Source: scores.readthedocs.io
    Link: https://scores.readthedocs.io/en/1.1.0/tutorials/Brier_Score.html
    Source snippet

    Brier score — scores 1.1.0 documentationThe Brier score is a strictly proper scoring rule where lower values are better (it is negatively...

  11. Source: rmets.onlinelibrary.wiley.com
    Link: https://rmets.onlinelibrary.wiley.com/doi/abs/10.1002/qj.2985
    Source snippet

    Royal Meteorological SocietySimplifying and generalising Murphy's Brier score...by S Siegert · 2017 · Cited by 45 — The decomposition of...

  12. Source: Wikipedia
    Title: The Brier
    Link: https://en.wikipedia.org/wiki/The_Brier
    Source snippet

    The BrierThe Brier (French: Le Brier), known since 2024 as the Montana's Brier for sponsorship reasons, is the annual Canadian men's c...

  13. Source: Wikipedia
    Title: Scoring rule
    Link: https://en.wikipedia.org/wiki/Scoring_rule
    Source snippet

    Scoring ruleA calibration curve allows to judge how well model predictions are calibrated, by comparing the predicted quantiles to the...

  14. Source: Wikipedia
    Title: Brier score
    Link: https://en.wikipedia.org/wiki/Brier_score
    Source snippet

    Brier score2.1 Three-component decomposition. 2.1.1 Reliability; 2.1.2 Resolution; 2.1.3 Uncertainty · 2.2 Two-component decomposition...

  15. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10189590/
    Source snippet

    improves forecasting - PMC - NIHby DN Ferreiro · 2023 · Cited by 3 — The Brier score of such 'compromise between compromises', was define...

  16. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12523994/
    Source snippet

    Brier Score - an Overall Summary Measure for Risk...by K Zhu · 2025 · Cited by 13 — The Brier score, mean squared error of predicted ris...

  17. Source: nea.gov.sg
    Link: https://www.nea.gov.sg/corporate-functions/weather
    Source snippet

    20 Jun 2024 — Weather · 24-hour weather forecast · 4-day Outlook · Warnings & Advisories · 24-hour Forecast · Rain Areas · UV Index · Exp...

  18. Source: insightful-data-lab.com
    Title: Murphy’s Decomposition
    Link: https://insightful-data-lab.com/2025/08/21/murphys-decomposition/
    Source snippet

    Your Gateway to Data Mastery21 Aug 2025 — Murphy's decomposition = a way to break down forecast error into calibration (reliability), sha...

  19. Source: rmets.onlinelibrary.wiley.com
    Link: https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.2985
    Source snippet

    and generalising Murphy's Brier score...15 Dec 2016 — The decomposition of the Brier score into Reliability, Resolution and Uncertainty...

  20. Source: onlinelibrary.wiley.com
    Link: https://onlinelibrary.wiley.com/doi/10.1111/risa.12776
    Source snippet

    of Superforecasting: The Art and Science...16 Mar 2017 — Technical concepts of calibration, resolution, and Brier scores are explained c...

  21. Source: rmets.onlinelibrary.wiley.com
    Link: https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.4478
    Source snippet

    conditional decomposition of proper scores: quantifying the...24 Apr 2023 — This decomposition does not rely on the scoring rule being p...

  22. Source: emergentmind.com
    Title: brier score term
    Link: https://www.emergentmind.com/topics/brier-score-term
    Source snippet

    Brier Score: Calibration, Resolution, and UncertaintyJul 25, 2025 — The Brier score term evaluates probabilistic forecasts by decomposing...

  23. Source: emergentmind.com
    Title: brier score
    Link: https://www.emergentmind.com/topics/brier-score
    Source snippet

    Murphy's classical decomposition expresses the Brier score as the sum of three terms:...Read more...

  24. Source: entropicthoughts.com
    Title: brier score
    Link: https://entropicthoughts.com/brier-score
    Source snippet

    4 Feb 2025 — A Brier score worse than 0.5 usually indicates sampling error, but a good forecaster could hypothetically grief an evaluator...

Additional References

  1. Source: researchgate.net
    Link: https://www.researchgate.net/publication/277087515_Identifying_and_Cultivating_Superforecasters_as_a_Method_of_Improving_Probabilistic_Predictions
    Source snippet

    (PDF) Identifying and Cultivating Superforecasters as a...25 May 2015 — In this article, we describe the winning strategy: culling off t...

    Published: May 2015

  2. Source: researchgate.net
    Link: https://www.researchgate.net/publication/253893961_Two_Extra_Components_in_the_Brier_Score_Decomposition

  3. Source: cambridge.org
    Title: recalibrating probabilistic forecasts to improve theiraccuracy
    Link: https://www.cambridge.org/core/services/aop-cambridge-core/content/view/5D8F0DFC90FC1E9B5C38C12992A6C6A7/S1930297500009049a.pdf/recalibrating_probabilistic_forecasts_to_improve_theiraccuracy.pdf
    Source snippet

    Recalibrating probabilistic forecasts to improve their accuracyby Y Han · 2022 · Cited by 12 — The third panel shows the Brier scores bas...

  4. Source: medium.com
    Link: https://medium.com/%40eligoz/some-notes-on-probabilistic-classifiers-iii-brier-score-decomposition-eee5f847d87f
    Source snippet

    ), Uncertainty(UNC), Resolution (RES), Within-bin-variance(WBV), Within-bin-...Read more...

  5. Source: researchgate.net
    Title: Reliability, Sufficiency, and the Decomposition of Proper
    Link: https://www.researchgate.net/publication/227532520_Reliability_Sufficiency_and_the_Decomposition_of_Proper_Scores
    Source snippet

    Associated entropies and divergences are used in the calibration-sharpness decomposition introduced by Bröcker [4] for proper scores of c...

  6. Source: forum.effectivealtruism.org
    Link: https://forum.effectivealtruism.org/posts/pnpnqA4hijnr59p7d/efforts-to-improve-the-accuracy-of-our-judgments-and
    Source snippet

    to Improve the Accuracy of Our Judgments and...25 Oct 2016 — Tetlock & Gardner (2015) explain this point in the context of assessing for...

  7. Source: medium.com
    Link: https://medium.com/%40eligoz/some-notes-on-probabilistic-classifiers-ii-scoring-rules-and-the-brier-score-16ffc1ba063b
    Source snippet

    any probabilistic forecast which is not exactly calibrated with the...Read more...

  8. Source: youtube.com
    Title: A Guide to Model Calibration | Calibration Plots | Brier Score
    Link: https://www.youtube.com/watch?v=rG2EfFOXyg0
    Source snippet

    Improve Your Decision-Making with Your Own Brier Score is highly relevant because it explains how to practically track confidence levels...

  9. Source: entropicthoughts.com
    Title: improving forecasting accuracy benchmark
    Link: https://entropicthoughts.com/improving-forecasting-accuracy-benchmark.html
    Source snippet

    Forecasting Accuracy7 Jul 2023 — The baseline Brier score on a 30-day forecast is 0.21 (in other words, the study subjects are somewhat b...

  10. Source: commoncog.com
    Title: how do you evaluate your own predictions
    Link: https://commoncog.com/how-do-you-evaluate-your-own-predictions/
    Source snippet

    ?17 Dec 2019 — This post provides a comprehensive summary of the technique that Tetlock and Gardner presents in Superforecasting.Read more...

Topic Tree

Follow this branch

Parent topic

Calibration How Confident Should You Be?

Related pages 5