Within Expert Gut
How Forecasting Makes Confidence Accountable
Scored forecasts turn vague confidence into testable probabilities, helping people learn when their judgement is calibrated.
On this page
- Why long range political judgement often fails
- How probabilistic scoring changes the learning loop
- Habits that make intuition easier to audit
Page outline Jump by section
Introduction
Forecasting tournaments are one of the clearest ways to test whether expert intuition deserves trust. Instead of asking who sounds persuasive, they ask participants to assign probabilities to specific future events, wait for the outcomes, and then score the quality of those forecasts. Over time, this turns confidence into a measurable track record rather than a matter of reputation. The result is a practical learning system: participants discover whether they are consistently overconfident, underconfident, or well calibrated, while organisations gain evidence about which forecasting habits genuinely improve judgement. This approach is especially valuable in domains such as geopolitics, public policy, and strategic planning, where intuition often operates in noisy environments with delayed and ambiguous feedback. [Cambridge University Press & Assessment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…
Why long-range political judgement often fails
Research on expert judgement has repeatedly shown that experience alone does not guarantee accurate long-range prediction. Political analysts, commentators and senior decision-makers often receive little direct feedback about whether their probabilistic beliefs were justified. Events unfold over months or years, causes are tangled together, and almost every outcome can be explained after the fact.
Forecasting tournaments address this problem by replacing broad opinions with clearly defined questions such as whether an election will occur by a given date, whether a peace agreement will be signed, or whether a country’s GDP growth will exceed a specified threshold. Each prediction is recorded before the outcome is known, making hindsight much harder.
This design emerged most prominently through forecasting competitions sponsored by the US intelligence research agency IARPA (Intelligence Advanced Research Projects Activity). One participant, the Good Judgment Project, demonstrated that structured forecasting methods, training, teamwork and continuous feedback substantially improved forecasting accuracy over several years of geopolitical prediction. [Cambridge University Press & Assessment+2Good Judgment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…
The broader lesson is not that experts are useless. Rather, expertise without reliable feedback can produce unwarranted certainty. Forecasting tournaments provide precisely the kind of repeated correction that Kahneman and Klein argued is necessary for genuine intuitive expertise to develop. [PubMed]pubmed.ncbi.nlm.nih.govConditions for intuitive expertise: a failure to disagreeby D Kahneman · 2009 · Cited by 4056 — This article reports on an effort t…
How probabilistic scoring changes the learning loop
The distinctive feature of forecasting tournaments is that they reward calibrated probability estimates instead of binary right-or-wrong answers.
Rather than predicting that an event “will happen”, participants estimate, for example, a 70% or 30% chance. When hundreds of similar forecasts accumulate, calibration becomes visible. A forecaster who repeatedly assigns 70% confidence should see roughly seven out of ten such events occur. If only four occur, their confidence is systematically too high.
Proper scoring rules, especially the Brier score, provide an objective way to measure this. A Brier score compares predicted probabilities with actual outcomes, rewarding forecasts that are both accurate and honestly expressed. Because the scoring rule penalises unjustified certainty, it encourages forecasters to report what they genuinely believe instead of making dramatic predictions for attention. [Cambridge University Press & Assessment+2arXiv]cambridge.orgCambridge University Press & AssessmentWeighted Brier score decompositions for topically…by EC Merkle · 2018 · Cited by 13 — Brier sco…
This creates a learning cycle that ordinary professional judgement rarely provides:
- Make an explicit probabilistic forecast.
- Record it before the outcome is known.
- Receive an objective score after resolution.
- Compare results across many forecasts rather than memorable anecdotes.
- Adjust future confidence levels accordingly.
The important feedback concerns not only whether someone was correct but whether their confidence matched reality. That distinction is often invisible in everyday decision-making.
Why tournaments outperform reputation
Forecasting tournaments separate forecasting skill from status, seniority and rhetorical confidence.
The Good Judgment Project found that a relatively small group of consistently high-performing forecasters—later called “superforecasters”—substantially outperformed both average participants and competing forecasting teams across thousands of geopolitical questions. Their advantage came not from secret information but from disciplined updating, careful probability estimation and willingness to revise beliefs as evidence changed. [Cambridge University Press & Assessment+2Good Judgment]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…
An important implication is that forecasting ability can differ from subject-matter expertise. A leading academic or policy specialist may possess deep knowledge while still being poorly calibrated about uncertain future events. Conversely, skilled forecasters often combine broad knowledge with strong judgement about uncertainty, base rates and evidence integration.
This does not diminish domain expertise. Instead, tournaments show that forecasting is a separate skill that benefits from deliberate practice and measurable feedback.
Habits that make intuition easier to audit
Forecasting tournaments are valuable because they cultivate habits that expose intuitive judgement to evidence rather than replacing intuition altogether.
Several practices repeatedly appear among successful forecasters:
- Express uncertainty numerically. Replacing words such as “likely” with explicit probabilities forces greater precision.
- Break difficult questions into smaller components. Estimating intermediate events often produces better final judgements than making one large intuitive leap.
- Update continuously. Good forecasters treat predictions as living estimates that should change when meaningful evidence appears.
- Keep score over many forecasts. Individual successes may reflect luck. Long-term calibration reveals genuine forecasting skill.
- Review mistakes systematically. Post-mortems focus on reasoning quality rather than whether an outcome happened to be favourable.
These habits gradually transform intuition from an unexamined feeling into something that can be compared against reality.
What forecasting tournaments do—and do not—measure
Forecasting tournaments provide unusually strong evidence about calibration, but they have limits.
Most tournament questions concern events that resolve within months or a few years. Extremely long-term predictions remain difficult because feedback arrives too slowly for rapid learning. Likewise, tournaments typically evaluate measurable events rather than broader strategic judgement, creativity or ethical reasoning.
Another limitation is that tournament success does not automatically transfer to every decision domain. Forecasting well is only one component of effective policy or organisational leadership. Decision-makers must still weigh values, costs, legal constraints and political feasibility.
Nevertheless, forecasting tournaments solve one problem that affects many expert communities: they replace impressive-sounding certainty with an empirical record of predictive performance. For improving analytical thinking, this is their greatest contribution. Instead of asking whether someone feels confident, they ask whether previous confidence levels matched reality often enough to justify future trust. Cambridge University Press & Assessment+2pmc.ncbi.nlm.nih.gov [cambridge.org]cambridge.orgCambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 152 — T…
Amazon book picks
Further Reading
Books and field guides related to How Forecasting Makes Confidence Accountable. Use these as the next step if you want deeper reading beyond the article.
Superforecasting
Directly explains probabilistic forecasting, calibration, feedback, and improving expert judgement.
Thinking, Fast and Slow
Provides the cognitive psychology behind overconfidence, judgement errors, and better decision-making.
Noise
Explains variability and error in judgement, complementing forecasting and calibration practices.
The Signal and the Noise
Broadly covers prediction, probabilistic thinking, and learning from uncertain outcomes.
Endnotes
-
Source: cambridge.org
Link: https://www.cambridge.org/core/journals/judgment-and-decision-making/article/developing-expert-political-judgment-the-impact-of-training-and-practice-on-judgmental-accuracy-in-geopolitical-forecasting-tournaments/123EB18425391D05FA6581FDBB3F309FSource snippet
Cambridge University Press & AssessmentThe impact of training and practice on judgmental accuracy...by W Chang · 2016 · Cited by 152 — T...
-
Source: pubmed.ncbi.nlm.nih.gov
Link: https://pubmed.ncbi.nlm.nih.gov/19739881/Source snippet
Conditions for intuitive expertise: a failure to disagreeby D Kahneman · 2009 · Cited by 4056 — This article reports on an effort t...
-
Source: cambridge.org
Link: https://www.cambridge.org/core/journals/judgment-and-decision-making/article/weighted-brier-score-decompositions-for-topically-heterogenous-forecasting-tournaments/8172E04F2DBC601DA5D953D4685CA346Source snippet
Cambridge University Press & AssessmentWeighted Brier score decompositions for topically...by EC Merkle · 2018 · Cited by 13 — Brier sco...
-
Source: arxiv.org
Title: arXiv Calibration Scoring Rules for Practical Prediction Training
Link: https://arxiv.org/abs/1808.07501 -
Source: pmc.ncbi.nlm.nih.gov
Title: The superforecasting hypothesis is challenged under real-life scarcity
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7333631/Source snippet
Superforecasting reality check: Evidence from a small pool of...by I Katsagounos · 2020 · Cited by 23 — The study contributes to the str...
-
Source: pmc.ncbi.nlm.nih.gov
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10189590/Source snippet
improves forecasting - PMC - NIHby DN Ferreiro · 2023 · Cited by 4 — Because higher Brier scores indicate lower prediction accuracy we re...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/2602.19520Source snippet
Domain-Specific Calibration Dynamics in Prediction Marketsby NA Le · 2026 · Cited by 2 — Tetlock and Gardner [65] demonstrated that struc...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/2507.04562Source snippet
Evaluating LLMs on Real-World Forecasting Against...by J Lu · 2025 · Cited by 1 — This paper attempts to measure and quantify how good t...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/about/Source snippet
About Superforecasting | Unprecedented Accurate &...Good Judgment's professional Superforecasters deliver unparalleled accuracy on forec...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/Source snippet
Good JudgmentGood Judgment: See the future sooner with SuperforecastingReports that Superforecasters were 30% more accurate than intellig...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/superforecasters-still-creme-de-la-creme-six-years-on/Source snippet
Superforecasters: Still Crème de la Crème Six Years OnDuring the IARPA tournament, Superforecasters routinely placed in the top 2% of acc...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/about/the-science-of-superforecasting/Source snippet
The Science Of SuperforecastingGood Judgment research discovered four keys to accurate forecasting: talent-spotting, training, teaming, a...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/wp-content/uploads/2022/10/Superforecaster-Accuracy.pdfSource snippet
Judgment measures accuracy using the Brier score, a score that shows how far a forecast fell from the truth (the closer the better). On a...
-
Source: Wikipedia
Title: The Good Judgment Project
Link: https://en.wikipedia.org/wiki/The_Good_Judgment_ProjectSource snippet
The Good Judgment ProjectPredictions are scored using Brier scores.... The top forecasters in GJP are "reportedly 30% better than int...
-
Source: gjopen.com
Link: https://www.gjopen.com/Source snippet
Good Judgment® OpenA forecasting services firm that equips corporate, government, and non-governmental decision-makers with the benefit o...
-
Source: emergentmind.com
Link: https://www.emergentmind.com/topics/superforecastersSource snippet
Metrics and Methods20 Feb 2026 — Superforecasters are experts whose calibrated, low Brier scores and advanced probabilistic methods outpe...
-
Source: alice.id.tue.nl
Link: https://www.alice.id.tue.nl/references/kahnemann-2003.pdfSource snippet
Kahneman - Nobel Lectureby D KAHNEMAN · Cited by 2283 — Together, we explored the psychology of intuitive beliefs and choices and ex- ami...
Additional References
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/277087515_Identifying_and_Cultivating_Superforecasters_as_a_Method_of_Improving_Probabilistic_PredictionsSource snippet
(PDF) Identifying and Cultivating Superforecasters as a...25 May 2015 — Mean standardized Brier scores for superforecasters (Supers) and...
Published: May 2015
-
Source: lukemuehlhauser.com
Link: https://www.lukemuehlhauser.com/wp-content/uploads/Tetlock-et-al-Forecasting-tournaments-tools-for-increasing-transparency-and-improving-the-quality-of-debate.pdfSource snippet
Forecasting TournamentsThis article describes a massive geopolitical tournament that tested clashing views on the feasibility of improvin...
-
Source: corporate.jasoncollins.blog
Link: https://corporate.jasoncollins.blog/better-forecastingSource snippet
jasoncollins.blog25 Better forecastingIn this page, I examine techniques to improve forecasting accuracy, primarily through evidence from...
-
Source: aiimpacts.org
Link: https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/Source snippet
Evidence on good forecasting practices from the...2 Jul 2019 — Tetlock used something very much like a Brier score in this tournament, b...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/26798603_Conditions_for_Intuitive_Expertise -
Source: researchgate.net
Link: https://www.researchgate.net/publication/220535084_Probability_Elicitation_Scoring_Rules_and_Competition_Among_ForecastersSource snippet
Probability Elicitation, Scoring Rules, and Competition...4 May 2026 — Probability forecasters who are rewarded via a proper scoring rul...
Published: May 2026
-
Source: edge.org
Link: https://www.edge.org/conversation/philip_tetlock-edge-master-class-2015-a-short-course-in-superforecasting-class-iiSource snippet
A Short Course in Superforecasting, Class IIAug 24, 2015 — There are different types of proper scoring rules, and some proper scoring rul...
-
Source: casact.org
Link: https://www.casact.org/sites/default/files/presentation/annual_2016_presentations_c-27.pdfSource snippet
• Hold intelligence community accountable for overall forecasting accuracy. • Don't blame when something bad...Read more...
-
Source: github.com
Title: Superforecasting and GJP. Good Judment Open. The Good Judgment
Link: https://github.com/jmoral4/superforecastinghelperSource snippet
tool for recording predictions and calculating Brier Scores ·...The Brier score ranges from 0 to 1, with lower values indicating more ac...
-
Source: coefficientgiving.org
Title: efforts to improve the accuracy of our judgments and forecasts
Link: https://coefficientgiving.org/research/efforts-to-improve-the-accuracy-of-our-judgments-and-forecasts/Source snippet
Efforts to Improve the Accuracy of Our Judgments and...Oct 25, 2016 — If we combine calibration and resolution, we arrive at a measure o...
Topic Tree


