How Forecasting Makes Confidence Accountable

Introduction

Forecasting tournaments are one of the clearest ways to test whether expert intuition deserves trust. Instead of asking who sounds persuasive, they ask participants to assign probabilities to specific future events, wait for the outcomes, and then score the quality of those forecasts. Over time, this turns confidence into a measurable track record rather than a matter of reputation. The result is a practical learning system: participants discover whether they are consistently overconfident, underconfident, or well calibrated, while organisations gain evidence about which forecasting habits genuinely improve judgement. This approach is especially valuable in domains such as geopolitics, public policy, and strategic planning, where intuition often operates in noisy environments with delayed and ambiguous feedback. [Cambridge University Press & Assessment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…

Forecast Feedback illustration 1

Why long-range political judgement often fails

Research on expert judgement has repeatedly shown that experience alone does not guarantee accurate long-range prediction. Political analysts, commentators and senior decision-makers often receive little direct feedback about whether their probabilistic beliefs were justified. Events unfold over months or years, causes are tangled together, and almost every outcome can be explained after the fact.

Forecasting tournaments address this problem by replacing broad opinions with clearly defined questions such as whether an election will occur by a given date, whether a peace agreement will be signed, or whether a country’s GDP growth will exceed a specified threshold. Each prediction is recorded before the outcome is known, making hindsight much harder.

This design emerged most prominently through forecasting competitions sponsored by the US intelligence research agency IARPA (Intelligence Advanced Research Projects Activity). One participant, the Good Judgment Project, demonstrated that structured forecasting methods, training, teamwork and continuous feedback substantially improved forecasting accuracy over several years of geopolitical prediction. [Cambridge University Press & Assessment+2Good Judgment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…

The broader lesson is not that experts are useless. Rather, expertise without reliable feedback can produce unwarranted certainty. Forecasting tournaments provide precisely the kind of repeated correction that Kahneman and Klein argued is necessary for genuine intuitive expertise to develop. [PubMed]pubmed.ncbi.nlm.nih.govConditions for intuitive expertise: a failure to disagreeby D Kahneman · 2009 · Cited by 4056 — This article reports on an effort t…

How probabilistic scoring changes the learning loop

The distinctive feature of forecasting tournaments is that they reward calibrated probability estimates instead of binary right-or-wrong answers.

Rather than predicting that an event “will happen”, participants estimate, for example, a 70% or 30% chance. When hundreds of similar forecasts accumulate, calibration becomes visible. A forecaster who repeatedly assigns 70% confidence should see roughly seven out of ten such events occur. If only four occur, their confidence is systematically too high.

Proper scoring rules, especially the Brier score, provide an objective way to measure this. A Brier score compares predicted probabilities with actual outcomes, rewarding forecasts that are both accurate and honestly expressed. Because the scoring rule penalises unjustified certainty, it encourages forecasters to report what they genuinely believe instead of making dramatic predictions for attention. [Cambridge University Press & Assessment+2arXiv]cambridge.orgCambridge University Press & AssessmentWeighted Brier score decompositions for topically…by EC Merkle · 2018 · Cited by 13 — Brier sco…

This creates a learning cycle that ordinary professional judgement rarely provides:

Make an explicit probabilistic forecast.
Record it before the outcome is known.
Receive an objective score after resolution.
Compare results across many forecasts rather than memorable anecdotes.
Adjust future confidence levels accordingly.

The important feedback concerns not only whether someone was correct but whether their confidence matched reality. That distinction is often invisible in everyday decision-making.

Forecast Feedback illustration 2

Why tournaments outperform reputation

Forecasting tournaments separate forecasting skill from status, seniority and rhetorical confidence.

The Good Judgment Project found that a relatively small group of consistently high-performing forecasters—later called “superforecasters”—substantially outperformed both average participants and competing forecasting teams across thousands of geopolitical questions. Their advantage came not from secret information but from disciplined updating, careful probability estimation and willingness to revise beliefs as evidence changed. [Cambridge University Press & Assessment+2Good Judgment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…

An important implication is that forecasting ability can differ from subject-matter expertise. A leading academic or policy specialist may possess deep knowledge while still being poorly calibrated about uncertain future events. Conversely, skilled forecasters often combine broad knowledge with strong judgement about uncertainty, base rates and evidence integration.

This does not diminish domain expertise. Instead, tournaments show that forecasting is a separate skill that benefits from deliberate practice and measurable feedback.

Habits that make intuition easier to audit

Forecasting tournaments are valuable because they cultivate habits that expose intuitive judgement to evidence rather than replacing intuition altogether.

Several practices repeatedly appear among successful forecasters:

Express uncertainty numerically. Replacing words such as “likely” with explicit probabilities forces greater precision.
Break difficult questions into smaller components. Estimating intermediate events often produces better final judgements than making one large intuitive leap.
Update continuously. Good forecasters treat predictions as living estimates that should change when meaningful evidence appears.
Keep score over many forecasts. Individual successes may reflect luck. Long-term calibration reveals genuine forecasting skill.
Review mistakes systematically. Post-mortems focus on reasoning quality rather than whether an outcome happened to be favourable.

These habits gradually transform intuition from an unexamined feeling into something that can be compared against reality.

Forecast Feedback illustration 3

What forecasting tournaments do—and do not—measure

Forecasting tournaments provide unusually strong evidence about calibration, but they have limits.

Most tournament questions concern events that resolve within months or a few years. Extremely long-term predictions remain difficult because feedback arrives too slowly for rapid learning. Likewise, tournaments typically evaluate measurable events rather than broader strategic judgement, creativity or ethical reasoning.

Another limitation is that tournament success does not automatically transfer to every decision domain. Forecasting well is only one component of effective policy or organisational leadership. Decision-makers must still weigh values, costs, legal constraints and political feasibility.

Nevertheless, forecasting tournaments solve one problem that affects many expert communities: they replace impressive-sounding certainty with an empirical record of predictive performance. For improving analytical thinking, this is their greatest contribution. Instead of asking whether someone feels confident, they ask whether previous confidence levels matched reality often enough to justify future trust. Cambridge University Press & Assessment+2pmc.ncbi.nlm.nih.gov [cambridge.org]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…

Amazon book picks

Marketplace Samples

Live-tested eBay searches with available results related to this page.

Example eBay listing

Strong Winds Are Forecast: A solo, 1850 m..., Ray, Nick

Search eBay.co.uk: forecasting game

Browse similar on eBay.co.uk

Example eBay listing

Times Gate Give Gaming Digital Clock Diy Screen Control Support Weather Forecast

Search eBay.co.uk: forecasting game

Browse similar on eBay.co.uk

Example eBay listing

Nebraska 9 Game Forecast White Graphic T Shirt

Search eBay.co.uk: forecasting game

Browse similar on eBay.co.uk

Browse more on eBay.co.uk

Example items shown for inspiration; availability and pricing can change. Branchoria may earn a commission if you purchase through outbound eBay links.

Endnotes

Source: cambridge.org
Link: https://www.cambridge.org/core/journals/judgment-and-decision-making/article/developing-expert-political-judgment-the-impact-of-training-and-practice-on-judgmental-accuracy-in-geopolitical-forecasting-tournaments/123EB18425391D05FA6581FDBB3F309F
Source snippet
Cambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy...by W Chang · 2016 · Cited by 152 — T...
Source: pubmed.ncbi.nlm.nih.gov
Link: https://pubmed.ncbi.nlm.nih.gov/19739881/
Source snippet
Conditions for intuitive expertise: a failure to disagreeby D Kahneman · 2009 · Cited by 4056 — This article reports on an effort t...
Source: cambridge.org
Link: https://www.cambridge.org/core/journals/judgment-and-decision-making/article/weighted-brier-score-decompositions-for-topically-heterogenous-forecasting-tournaments/8172E04F2DBC601DA5D953D4685CA346
Source snippet
Cambridge University Press & AssessmentWeighted Brier score decompositions for topically...by EC Merkle · 2018 · Cited by 13 — Brier sco...
Source: arxiv.org
Title: arXiv Calibration Scoring Rules for Practical Prediction Training
Link: https://arxiv.org/abs/1808.07501
Source: pmc.ncbi.nlm.nih.gov
Title: The superforecasting hypothesis is challenged under real-life scarcity
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7333631/
Source snippet
Superforecasting reality check: Evidence from a small pool of...by I Katsagounos · 2020 · Cited by 23 — The study contributes to the str...
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10189590/
Source snippet
improves forecasting - PMC - NIHby DN Ferreiro · 2023 · Cited by 4 — Because higher Brier scores indicate lower prediction accuracy we re...
Source: arxiv.org
Link: https://arxiv.org/pdf/2602.19520
Source snippet
Domain-Specific Calibration Dynamics in Prediction Marketsby NA Le · 2026 · Cited by 2 — Tetlock and Gardner [65] demonstrated that struc...
Source: arxiv.org
Link: https://arxiv.org/pdf/2507.04562
Source snippet
Evaluating LLMs on Real-World Forecasting Against...by J Lu · 2025 · Cited by 1 — This paper attempts to measure and quantify how good t...
Source: goodjudgment.com
Link: https://goodjudgment.com/about/
Source snippet
About Superforecasting | Unprecedented Accurate &...Good Judgment's professional Superforecasters deliver unparalleled accuracy on forec...
Source: goodjudgment.com
Link: https://goodjudgment.com/
Source snippet
Good JudgmentGood Judgment: See the future sooner with SuperforecastingReports that Superforecasters were 30% more accurate than intellig...
Source: goodjudgment.com
Link: https://goodjudgment.com/superforecasters-still-creme-de-la-creme-six-years-on/
Source snippet
Superforecasters: Still Crème de la Crème Six Years OnDuring the IARPA tournament, Superforecasters routinely placed in the top 2% of acc...
Source: goodjudgment.com
Link: https://goodjudgment.com/about/the-science-of-superforecasting/
Source snippet
The Science Of SuperforecastingGood Judgment research discovered four keys to accurate forecasting: talent-spotting, training, teaming, a...
Source: goodjudgment.com
Link: https://goodjudgment.com/wp-content/uploads/2022/10/Superforecaster-Accuracy.pdf
Source snippet
Judgment measures accuracy using the Brier score, a score that shows how far a forecast fell from the truth (the closer the better). On a...
Source: Wikipedia
Title: The Good Judgment Project
Link: https://en.wikipedia.org/wiki/The_Good_Judgment_Project
Source snippet
The Good Judgment ProjectPredictions are scored using Brier scores.... The top forecasters in GJP are "reportedly 30% better than int...
Source: gjopen.com
Link: https://www.gjopen.com/
Source snippet
Good Judgment® OpenA forecasting services firm that equips corporate, government, and non-governmental decision-makers with the benefit o...
Source: emergentmind.com
Link: https://www.emergentmind.com/topics/superforecasters
Source snippet
Metrics and Methods20 Feb 2026 — Superforecasters are experts whose calibrated, low Brier scores and advanced probabilistic methods outpe...
Source: alice.id.tue.nl
Link: https://www.alice.id.tue.nl/references/kahnemann-2003.pdf
Source snippet
Kahneman - Nobel Lectureby D KAHNEMAN · Cited by 2283 — Together, we explored the psychology of intuitive beliefs and choices and ex- ami...

Additional References

Source: researchgate.net
Link: https://www.researchgate.net/publication/277087515_Identifying_and_Cultivating_Superforecasters_as_a_Method_of_Improving_Probabilistic_Predictions
Source snippet
(PDF) Identifying and Cultivating Superforecasters as a...25 May 2015 — Mean standardized Brier scores for superforecasters (Supers) and...

Published: May 2015
Source: lukemuehlhauser.com
Link: https://www.lukemuehlhauser.com/wp-content/uploads/Tetlock-et-al-Forecasting-tournaments-tools-for-increasing-transparency-and-improving-the-quality-of-debate.pdf
Source snippet
Forecasting TournamentsThis article describes a massive geopolitical tournament that tested clashing views on the feasibility of improvin...
Source: corporate.jasoncollins.blog
Link: https://corporate.jasoncollins.blog/better-forecasting
Source snippet
jasoncollins.blog25 Better forecastingIn this page, I examine techniques to improve forecasting accuracy, primarily through evidence from...
Source: aiimpacts.org
Link: https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/
Source snippet
Evidence on good forecasting practices from the...2 Jul 2019 — Tetlock used something very much like a Brier score in this tournament, b...
Source: researchgate.net
Link: https://www.researchgate.net/publication/26798603_Conditions_for_Intuitive_Expertise
Source: researchgate.net
Link: https://www.researchgate.net/publication/220535084_Probability_Elicitation_Scoring_Rules_and_Competition_Among_Forecasters
Source snippet
Probability Elicitation, Scoring Rules, and Competition...4 May 2026 — Probability forecasters who are rewarded via a proper scoring rul...

Published: May 2026
Source: edge.org
Link: https://www.edge.org/conversation/philip_tetlock-edge-master-class-2015-a-short-course-in-superforecasting-class-ii
Source snippet
A Short Course in Superforecasting, Class IIAug 24, 2015 — There are different types of proper scoring rules, and some proper scoring rul...
Source: casact.org
Link: https://www.casact.org/sites/default/files/presentation/annual_2016_presentations_c-27.pdf
Source snippet
• Hold intelligence community accountable for overall forecasting accuracy. • Don't blame when something bad...Read more...
Source: github.com
Title: Superforecasting and GJP. Good Judment Open. The Good Judgment
Link: https://github.com/jmoral4/superforecastinghelper
Source snippet
tool for recording predictions and calculating Brier Scores ·...The Brier score ranges from 0 to 1, with lower values indicating more ac...
Source: coefficientgiving.org
Title: efforts to improve the accuracy of our judgments and forecasts
Link: https://coefficientgiving.org/research/efforts-to-improve-the-accuracy-of-our-judgments-and-forecasts/
Source snippet
Efforts to Improve the Accuracy of Our Judgments and...Oct 25, 2016 — If we combine calibration and resolution, we arrive at a measure o...

How Forecasting Makes Confidence Accountable

Introduction

Why long-range political judgement often fails

How probabilistic scoring changes the learning loop

Why tournaments outperform reputation

Habits that make intuition easier to audit

What forecasting tournaments do—and do not—measure

Further Reading

Superforecasting

Thinking, Fast and Slow

Noise

The Signal and the Noise

Marketplace Samples

Strong Winds Are Forecast: A solo, 1850 m..., Ray, Nick

Times Gate Give Gaming Digital Clock Diy Screen Control Support Weather Forecast

Nebraska 9 Game Forecast White Graphic T Shirt

Endnotes

Additional References

Follow this branch

Parent topic

Related pages 5