Within Expert Gut

How Forecasting Makes Confidence Accountable

Scored forecasts turn vague confidence into testable probabilities, helping people learn when their judgement is calibrated.

On this page

  • Why long range political judgement often fails
  • How probabilistic scoring changes the learning loop
  • Habits that make intuition easier to audit
Preview for How Forecasting Makes Confidence Accountable

Introduction

Forecasting tournaments are one of the clearest ways to test whether expert intuition deserves trust. Instead of asking who sounds persuasive, they ask participants to assign probabilities to specific future events, wait for the outcomes, and then score the quality of those forecasts. Over time, this turns confidence into a measurable track record rather than a matter of reputation. The result is a practical learning system: participants discover whether they are consistently overconfident, underconfident, or well calibrated, while organisations gain evidence about which forecasting habits genuinely improve judgement. This approach is especially valuable in domains such as geopolitics, public policy, and strategic planning, where intuition often operates in noisy environments with delayed and ambiguous feedback. [Cambridge University Press & Assessment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…

Forecast Feedback illustration 1

Why long-range political judgement often fails

Research on expert judgement has repeatedly shown that experience alone does not guarantee accurate long-range prediction. Political analysts, commentators and senior decision-makers often receive little direct feedback about whether their probabilistic beliefs were justified. Events unfold over months or years, causes are tangled together, and almost every outcome can be explained after the fact.

Forecasting tournaments address this problem by replacing broad opinions with clearly defined questions such as whether an election will occur by a given date, whether a peace agreement will be signed, or whether a country’s GDP growth will exceed a specified threshold. Each prediction is recorded before the outcome is known, making hindsight much harder.

This design emerged most prominently through forecasting competitions sponsored by the US intelligence research agency IARPA (Intelligence Advanced Research Projects Activity). One participant, the Good Judgment Project, demonstrated that structured forecasting methods, training, teamwork and continuous feedback substantially improved forecasting accuracy over several years of geopolitical prediction. [Cambridge University Press & Assessment+2Good Judgment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…

The broader lesson is not that experts are useless. Rather, expertise without reliable feedback can produce unwarranted certainty. Forecasting tournaments provide precisely the kind of repeated correction that Kahneman and Klein argued is necessary for genuine intuitive expertise to develop. [PubMed]pubmed.ncbi.nlm.nih.govConditions for intuitive expertise: a failure to disagreeby D Kahneman · 2009 · Cited by 4056 — This article reports on an effort t…

How probabilistic scoring changes the learning loop

The distinctive feature of forecasting tournaments is that they reward calibrated probability estimates instead of binary right-or-wrong answers.

Rather than predicting that an event “will happen”, participants estimate, for example, a 70% or 30% chance. When hundreds of similar forecasts accumulate, calibration becomes visible. A forecaster who repeatedly assigns 70% confidence should see roughly seven out of ten such events occur. If only four occur, their confidence is systematically too high.

Proper scoring rules, especially the Brier score, provide an objective way to measure this. A Brier score compares predicted probabilities with actual outcomes, rewarding forecasts that are both accurate and honestly expressed. Because the scoring rule penalises unjustified certainty, it encourages forecasters to report what they genuinely believe instead of making dramatic predictions for attention. [Cambridge University Press & Assessment+2arXiv]cambridge.orgCambridge University Press & AssessmentWeighted Brier score decompositions for topically…by EC Merkle · 2018 · Cited by 13 — Brier sco…

This creates a learning cycle that ordinary professional judgement rarely provides:

  • Make an explicit probabilistic forecast.
  • Record it before the outcome is known.
  • Receive an objective score after resolution.
  • Compare results across many forecasts rather than memorable anecdotes.
  • Adjust future confidence levels accordingly.

The important feedback concerns not only whether someone was correct but whether their confidence matched reality. That distinction is often invisible in everyday decision-making.

Forecast Feedback illustration 2

Why tournaments outperform reputation

Forecasting tournaments separate forecasting skill from status, seniority and rhetorical confidence.

The Good Judgment Project found that a relatively small group of consistently high-performing forecasters—later called “superforecasters”—substantially outperformed both average participants and competing forecasting teams across thousands of geopolitical questions. Their advantage came not from secret information but from disciplined updating, careful probability estimation and willingness to revise beliefs as evidence changed. [Cambridge University Press & Assessment+2Good Judgment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…

An important implication is that forecasting ability can differ from subject-matter expertise. A leading academic or policy specialist may possess deep knowledge while still being poorly calibrated about uncertain future events. Conversely, skilled forecasters often combine broad knowledge with strong judgement about uncertainty, base rates and evidence integration.

This does not diminish domain expertise. Instead, tournaments show that forecasting is a separate skill that benefits from deliberate practice and measurable feedback.

Habits that make intuition easier to audit

Forecasting tournaments are valuable because they cultivate habits that expose intuitive judgement to evidence rather than replacing intuition altogether.

Several practices repeatedly appear among successful forecasters:

  • Express uncertainty numerically. Replacing words such as “likely” with explicit probabilities forces greater precision.
  • Break difficult questions into smaller components. Estimating intermediate events often produces better final judgements than making one large intuitive leap.
  • Update continuously. Good forecasters treat predictions as living estimates that should change when meaningful evidence appears.
  • Keep score over many forecasts. Individual successes may reflect luck. Long-term calibration reveals genuine forecasting skill.
  • Review mistakes systematically. Post-mortems focus on reasoning quality rather than whether an outcome happened to be favourable.

These habits gradually transform intuition from an unexamined feeling into something that can be compared against reality.

Forecast Feedback illustration 3

What forecasting tournaments do—and do not—measure

Forecasting tournaments provide unusually strong evidence about calibration, but they have limits.

Most tournament questions concern events that resolve within months or a few years. Extremely long-term predictions remain difficult because feedback arrives too slowly for rapid learning. Likewise, tournaments typically evaluate measurable events rather than broader strategic judgement, creativity or ethical reasoning.

Another limitation is that tournament success does not automatically transfer to every decision domain. Forecasting well is only one component of effective policy or organisational leadership. Decision-makers must still weigh values, costs, legal constraints and political feasibility.

Nevertheless, forecasting tournaments solve one problem that affects many expert communities: they replace impressive-sounding certainty with an empirical record of predictive performance. For improving analytical thinking, this is their greatest contribution. Instead of asking whether someone feels confident, they ask whether previous confidence levels matched reality often enough to justify future trust. Cambridge University Press & Assessment+2pmc.ncbi.nlm.nih.gov [cambridge.org]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…

Amazon book picks

Further Reading

Books and field guides related to How Forecasting Makes Confidence Accountable. Use these as the next step if you want deeper reading beyond the article.

BookCover for Superforecasting

Superforecasting

By Philip Eyrikson Tetlock, Dan Gardner

Directly explains probabilistic forecasting, calibration, feedback, and improving expert judgement.

BookCover for Noise

Noise

By Daniel Kahneman, Olivier Sibony et al.

Explains variability and error in judgement, complementing forecasting and calibration practices.

eBay marketplace picks

Marketplace Samples

Live-tested eBay searches with available results related to this page.

Using USA

Endnotes

  1. Source: cambridge.org
    Link: https://www.cambridge.org/core/journals/judgment-and-decision-making/article/developing-expert-political-judgment-the-impact-of-training-and-practice-on-judgmental-accuracy-in-geopolitical-forecasting-tournaments/123EB18425391D05FA6581FDBB3F309F
    Source snippet

    Cambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy...by W Chang · 2016 · Cited by 152 — T...

  2. Source: pubmed.ncbi.nlm.nih.gov
    Link: https://pubmed.ncbi.nlm.nih.gov/19739881/
    Source snippet

    Conditions for intuitive expertise: a failure to disagreeby D Kahneman · 2009 · Cited by 4056 — This article reports on an effort t...

  3. Source: cambridge.org
    Link: https://www.cambridge.org/core/journals/judgment-and-decision-making/article/weighted-brier-score-decompositions-for-topically-heterogenous-forecasting-tournaments/8172E04F2DBC601DA5D953D4685CA346
    Source snippet

    Cambridge University Press & AssessmentWeighted Brier score decompositions for topically...by EC Merkle · 2018 · Cited by 13 — Brier sco...

  4. Source: arxiv.org
    Title: arXiv Calibration Scoring Rules for Practical Prediction Training
    Link: https://arxiv.org/abs/1808.07501

  5. Source: pmc.ncbi.nlm.nih.gov
    Title: The superforecasting hypothesis is challenged under real-life scarcity
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7333631/
    Source snippet

    Superforecasting reality check: Evidence from a small pool of...by I Katsagounos · 2020 · Cited by 23 — The study contributes to the str...

  6. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10189590/
    Source snippet

    improves forecasting - PMC - NIHby DN Ferreiro · 2023 · Cited by 4 — Because higher Brier scores indicate lower prediction accuracy we re...

  7. Source: arxiv.org
    Link: https://arxiv.org/pdf/2602.19520
    Source snippet

    Domain-Specific Calibration Dynamics in Prediction Marketsby NA Le · 2026 · Cited by 2 — Tetlock and Gardner [65] demonstrated that struc...

  8. Source: arxiv.org
    Link: https://arxiv.org/pdf/2507.04562
    Source snippet

    Evaluating LLMs on Real-World Forecasting Against...by J Lu · 2025 · Cited by 1 — This paper attempts to measure and quantify how good t...

  9. Source: goodjudgment.com
    Link: https://goodjudgment.com/about/
    Source snippet

    About Superforecasting | Unprecedented Accurate &...Good Judgment's professional Superforecasters deliver unparalleled accuracy on forec...

  10. Source: goodjudgment.com
    Link: https://goodjudgment.com/
    Source snippet

    Good JudgmentGood Judgment: See the future sooner with SuperforecastingReports that Superforecasters were 30% more accurate than intellig...

  11. Source: goodjudgment.com
    Link: https://goodjudgment.com/superforecasters-still-creme-de-la-creme-six-years-on/
    Source snippet

    Superforecasters: Still Crème de la Crème Six Years OnDuring the IARPA tournament, Superforecasters routinely placed in the top 2% of acc...

  12. Source: goodjudgment.com
    Link: https://goodjudgment.com/about/the-science-of-superforecasting/
    Source snippet

    The Science Of SuperforecastingGood Judgment research discovered four keys to accurate forecasting: talent-spotting, training, teaming, a...

  13. Source: goodjudgment.com
    Link: https://goodjudgment.com/wp-content/uploads/2022/10/Superforecaster-Accuracy.pdf
    Source snippet

    Judgment measures accuracy using the Brier score, a score that shows how far a forecast fell from the truth (the closer the better). On a...

  14. Source: Wikipedia
    Title: The Good Judgment Project
    Link: https://en.wikipedia.org/wiki/The_Good_Judgment_Project
    Source snippet

    The Good Judgment ProjectPredictions are scored using Brier scores.... The top forecasters in GJP are "reportedly 30% better than int...

  15. Source: gjopen.com
    Link: https://www.gjopen.com/
    Source snippet

    Good Judgment® OpenA forecasting services firm that equips corporate, government, and non-governmental decision-makers with the benefit o...

  16. Source: emergentmind.com
    Link: https://www.emergentmind.com/topics/superforecasters
    Source snippet

    Metrics and Methods20 Feb 2026 — Superforecasters are experts whose calibrated, low Brier scores and advanced probabilistic methods outpe...

  17. Source: alice.id.tue.nl
    Link: https://www.alice.id.tue.nl/references/kahnemann-2003.pdf
    Source snippet

    Kahneman - Nobel Lectureby D KAHNEMAN · Cited by 2283 — Together, we explored the psychology of intuitive beliefs and choices and ex- ami...

Additional References

  1. Source: researchgate.net
    Link: https://www.researchgate.net/publication/277087515_Identifying_and_Cultivating_Superforecasters_as_a_Method_of_Improving_Probabilistic_Predictions
    Source snippet

    (PDF) Identifying and Cultivating Superforecasters as a...25 May 2015 — Mean standardized Brier scores for superforecasters (Supers) and...

    Published: May 2015

  2. Source: lukemuehlhauser.com
    Link: https://www.lukemuehlhauser.com/wp-content/uploads/Tetlock-et-al-Forecasting-tournaments-tools-for-increasing-transparency-and-improving-the-quality-of-debate.pdf
    Source snippet

    Forecasting TournamentsThis article describes a massive geopolitical tournament that tested clashing views on the feasibility of improvin...

  3. Source: corporate.jasoncollins.blog
    Link: https://corporate.jasoncollins.blog/better-forecasting
    Source snippet

    jasoncollins.blog25 Better forecastingIn this page, I examine techniques to improve forecasting accuracy, primarily through evidence from...

  4. Source: aiimpacts.org
    Link: https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/
    Source snippet

    Evidence on good forecasting practices from the...2 Jul 2019 — Tetlock used something very much like a Brier score in this tournament, b...

  5. Source: researchgate.net
    Link: https://www.researchgate.net/publication/26798603_Conditions_for_Intuitive_Expertise

  6. Source: researchgate.net
    Link: https://www.researchgate.net/publication/220535084_Probability_Elicitation_Scoring_Rules_and_Competition_Among_Forecasters
    Source snippet

    Probability Elicitation, Scoring Rules, and Competition...4 May 2026 — Probability forecasters who are rewarded via a proper scoring rul...

    Published: May 2026

  7. Source: edge.org
    Link: https://www.edge.org/conversation/philip_tetlock-edge-master-class-2015-a-short-course-in-superforecasting-class-ii
    Source snippet

    A Short Course in Superforecasting, Class IIAug 24, 2015 — There are different types of proper scoring rules, and some proper scoring rul...

  8. Source: casact.org
    Link: https://www.casact.org/sites/default/files/presentation/annual_2016_presentations_c-27.pdf
    Source snippet

    • Hold intelligence community accountable for overall forecasting accuracy. • Don't blame when something bad...Read more...

  9. Source: github.com
    Title: Superforecasting and GJP. Good Judment Open. The Good Judgment
    Link: https://github.com/jmoral4/superforecastinghelper
    Source snippet

    tool for recording predictions and calculating Brier Scores ·...The Brier score ranges from 0 to 1, with lower values indicating more ac...

  10. Source: coefficientgiving.org
    Title: efforts to improve the accuracy of our judgments and forecasts
    Link: https://coefficientgiving.org/research/efforts-to-improve-the-accuracy-of-our-judgments-and-forecasts/
    Source snippet

    Efforts to Improve the Accuracy of Our Judgments and...Oct 25, 2016 — If we combine calibration and resolution, we arrive at a measure o...

Topic Tree

Follow this branch

Parent topic

Expert Gut When Should You Trust Your Gut?

Related pages 5