Within Calibration
What Forecasting Tournaments Teach About Better Judgment
Forecasting tournaments show how probability practice, feedback and collaboration can improve confidence calibration.
On this page
- Why tournaments record probabilities before outcomes
- How feedback and collaboration improve forecasts
- What ordinary teams can borrow from the model
Page outline Jump by section
Introduction
Forecasting tournaments are one of the clearest demonstrations that judgement under uncertainty can be improved through deliberate practice rather than treated as a fixed talent. Participants are asked to assign numerical probabilities to questions about future events before the outcomes are known, their predictions are scored after resolution, and they receive repeated feedback on both accuracy and confidence. Over time, this process exposes systematic overconfidence, rewards well-calibrated judgement, and encourages better habits of reasoning. Rather than treating forecasting as guesswork, tournaments turn it into a measurable learning exercise in which probabilities, evidence, revisions and outcomes are all recorded. Research from large forecasting competitions, particularly those associated with the Intelligence Advanced Research Projects Activity (IARPA) and the Good Judgment Project, shows that structured training, regular feedback, collaborative discussion and careful aggregation can produce substantially better-calibrated forecasts than conventional expert judgement alone. [journals.sagepub.com+2iarpa.gov]journals.sagepub.comForecasting TournamentsTetlock, Barbara A….4 Aug 2014 — Forecasting tournaments are level-playing-field competitions that reveal which individuals, teams, or…
Why tournaments record probabilities before outcomes
The defining feature of a forecasting tournament is that predictions are made before anyone knows the answer. Each forecast must specify a probability—for example, a 70% chance that a peace agreement will be signed within six months, or a 20% chance that inflation will exceed a given threshold. Once the question resolves, the forecast is compared with reality using a proper scoring rule, most commonly the Brier score, which rewards both accuracy and honest expression of uncertainty rather than confident guessing. [Wharton Faculty Platform]faculty.wharton.upenn.edu2015 superforecastersWharton Faculty Platform2015—superforecasters.pdf - Wharton Faculty Platformby B Mellers · 2015 · Cited by 323 — Brier scores are the a…
This structure solves several common problems in everyday judgement.
- Hindsight bias: predictions cannot be unconsciously rewritten after the event.
- Outcome bias: a sound decision is not mistaken for a poor one simply because chance intervened.
- Vague confidence: words such as “likely” or “probably” are replaced by explicit probabilities that can later be evaluated.
Because hundreds of questions accumulate over months or years, participants receive a statistically meaningful picture of their calibration. Someone who routinely assigns 90% confidence to uncertain questions but is correct only 70% of the time will quickly see evidence of overconfidence. Conversely, someone whose 60% predictions succeed close to six times in ten is well calibrated, even if they are occasionally wrong. [journals.sagepub.com+2Wharton Faculty Platform]journals.sagepub.comForecasting TournamentsTetlock, Barbara A….4 Aug 2014 — Forecasting tournaments are level-playing-field competitions that reveal which individuals, teams, or…
How feedback and collaboration improve forecasts
One of the most important findings from major forecasting tournaments is that improvement comes from repeated cycles of prediction, feedback and revision rather than from one-off instruction.
The Good Judgment Project, which won IARPA’s multi-year geopolitical forecasting competition, combined several elements:
- brief training in probabilistic reasoning;
- continual scoring of forecasts;
- opportunities to revise predictions as new evidence emerged;
- carefully designed collaborative teams;
- statistical aggregation of multiple forecasts. [goodjudgment.com+2learnmoore.org]goodjudgment.comThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 147 — Ad- ditional details on the forecasting…
Participants were encouraged to update forecasts whenever meaningful evidence appeared instead of defending their original judgement. Subsequent analyses found that better forecasters tended to make more frequent, incremental updates rather than large swings driven by single news events. This behaviour improved both calibration and overall forecasting accuracy because beliefs stayed aligned with changing evidence instead of becoming anchored to first impressions. [ResearchGate]researchgate.netResearchGate(PDF) Small steps to accuracy: Incremental belief updaters…This article explores how real-world forecasters who vary in sk…
Collaboration also mattered, but not because groups automatically outperform individuals. The most successful teams were trained to challenge assumptions constructively, explain reasoning clearly and evaluate evidence rather than authority or confidence. Later analyses of Good Judgment Project data found that compromise or aggregated forecasts often exceeded the performance of individual forecasts, illustrating the practical value of combining partially independent judgements. [learnmoore.org+2PMC]learnmoore.orgMellers et al 2014.pdfby B Mellers · 2014 · Cited by 434 — They were taught strategies for explaining their forecasts to others, offering…
What made the IARPA tournaments distinctive?
The IARPA Aggregative Contingent Estimation (ACE) programme was designed not merely to identify good forecasters but to test competing methods for improving judgement. Multiple research teams used different combinations of training, team structures, aggregation algorithms and selection methods while forecasting hundreds of real geopolitical events over several years. [iarpa.gov]iarpa.govACEThe goal of the ACE Program is to dramatically enhance the accuracy, precision, and timeliness of intelligence forecasts for a broad r…
The results challenged the assumption that forecasting skill is mostly innate or dependent on privileged information.
Research emerging from the tournament showed that:
- modest training in probabilistic reasoning improved performance;
- identifying consistently well-calibrated forecasters produced further gains;
- collaborative forecasting outperformed many independent approaches;
- sophisticated aggregation methods improved on simple averages;
- the best-performing forecasters remained consistently better than most participants over long periods rather than succeeding through luck alone. [goodjudgment.com+3goodjudgment.com+3ResearchGate]goodjudgment.comThe impact of training and practice on judgmental accuracy…by W Chang · 2016 · Cited by 147 — Ad- ditional details on the forecasting…
These findings helped popularise the idea of “superforecasters”—individuals who consistently produced unusually accurate and well-calibrated probability estimates across many unrelated topics. Importantly, research suggested that their advantage reflected disciplined reasoning habits, active updating and careful calibration more than specialised domain expertise alone. [Wharton Faculty Platform+2journals.sagepub.com]faculty.wharton.upenn.edu2015 superforecastersWharton Faculty Platform2015—superforecasters.pdf - Wharton Faculty Platformby B Mellers · 2015 · Cited by 323 — Brier scores are the a…
What ordinary teams can borrow from the model
Most organisations do not need a formal forecasting tournament to benefit from its principles. The core learning mechanisms are surprisingly portable.
A practical version can include:
- Recording numerical probabilities before important decisions.
- Defining objective resolution criteria in advance.
- Reviewing outcomes after enough cases have accumulated.
- Scoring forecasts consistently rather than relying on memory.
- Discussing why forecasts changed, not simply whether they were correct.
- Encouraging revision when evidence changes instead of treating updates as admissions of failure.
For example, a product team might estimate the probability that a software release will meet its deadline, a sales team might forecast quarterly revenue ranges, or a management group might estimate the chance that a regulatory approval will arrive within a specified period. After several months, calibration can be assessed by comparing stated probabilities with actual outcomes rather than by relying on subjective impressions.
Equally important is separating the quality of reasoning from the eventual result. A carefully justified 40% forecast that fails may represent better judgement than an unjustified 95% prediction that succeeds by chance. Forecasting tournaments repeatedly reinforce this distinction because participants are evaluated across many predictions instead of memorable anecdotes. [journals.sagepub.com+2Wharton Faculty Platform]journals.sagepub.comForecasting TournamentsTetlock, Barbara A….4 Aug 2014 — Forecasting tournaments are level-playing-field competitions that reveal which individuals, teams, or…
Limits and lessons
Forecasting tournaments are not a universal solution. Many important decisions concern unique situations with poorly defined outcomes, limited feedback or very long time horizons. Calibration is also easier for binary questions than for complex strategic choices involving multiple interacting uncertainties.
There are methodological cautions as well. Winning a tournament does not necessarily identify the single “best” forecaster because chance still plays a role in rankings, particularly when many contestants perform at similarly high levels. Researchers have also questioned how easily tournament results transfer to environments with fewer questions, scarce data or specialised domains. [arXiv]arxiv.orgarXiv A Prediction Tournament ParadoxA Prediction Tournament ParadoxMarch 5, 2019…
Even so, the central lesson has proved remarkably robust. When people express uncertainty numerically, receive honest feedback, update beliefs in response to evidence and learn from repeated scoring, their confidence becomes better matched to reality. Forecasting tournaments therefore serve as practical laboratories for improving confidence calibration—not because they eliminate uncertainty, but because they make uncertainty measurable, discussable and ultimately learnable. [journals.sagepub.com+2goodjudgment.com]journals.sagepub.comForecasting TournamentsTetlock, Barbara A….4 Aug 2014 — Forecasting tournaments are level-playing-field competitions that reveal which individuals, teams, or…
Endnotes
-
Source: journals.sagepub.com
Title: Forecasting Tournaments
Link: https://journals.sagepub.com/doi/10.1177/0963721414534257Source snippet
Tetlock, Barbara A....4 Aug 2014 — Forecasting tournaments are level-playing-field competitions that reveal which individuals, teams, or...
-
Source: iarpa.gov
Link: https://www.iarpa.gov/research-programs/aceSource snippet
ACEThe goal of the ACE Program is to dramatically enhance the accuracy, precision, and timeliness of intelligence forecasts for a broad r...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/wp-content/uploads/2018/12/jdm16511.pdfSource snippet
The impact of training and practice on judgmental accuracy...by W Chang · 2016 · Cited by 147 — Ad- ditional details on the forecasting...
-
Source: learnmoore.org
Link: https://learnmoore.org/papers/Mellers%20et%20al%202014.pdfSource snippet
Mellers et al 2014.pdfby B Mellers · 2014 · Cited by 434 — They were taught strategies for explaining their forecasts to others, offering...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/340292724_Small_steps_to_accuracy_Incremental_belief_updaters_are_better_forecastersSource snippet
ResearchGate(PDF) Small steps to accuracy: Incremental belief updaters...This article explores how real-world forecasters who vary in sk...
-
Source: pmc.ncbi.nlm.nih.gov
Title: PMCCompromising improves forecasting
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10189590/Source snippet
improves forecasting - PMC - NIHby DN Ferreiro · 2023 · Cited by 4 — We test this by analysing 5 years of data from the Good Judgement Pr...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/277087515_Identifying_and_Cultivating_Superforecasters_as_a_Method_of_Improving_Probabilistic_PredictionsSource snippet
(PDF) Identifying and Cultivating Superforecasters as a...25 May 2015 — Effective aggregation mechanisms are central to crowd-forecastin...
Published: May 2015
-
Source: goodjudgment.com
Link: https://goodjudgment.com/resources/the-superforecasters-track-record/Source snippet
The Superforecasters' Track RecordSuperforecasters beat all competing research teams in the IARPA ACE tournament by 35-72%. Good Judgment...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/about/the-science-of-superforecasting/Source snippet
The Science Of SuperforecastingGood Judgment research discovered four keys to accurate forecasting: talent-spotting, training, teaming, a...
-
Source: arxiv.org
Title: arXiv A Prediction Tournament Paradox
Link: https://arxiv.org/abs/1903.02131Source snippet
A Prediction Tournament ParadoxMarch 5, 2019...
Published: March 5, 2019
-
Source: pmc.ncbi.nlm.nih.gov
Title: The superforecasting hypothesis is challenged under real-life scarcity
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7333631/Source snippet
Superforecasting reality check: Evidence from a small pool of...by I Katsagounos · 2020 · Cited by 24 — The study contributes to the...
-
Source: iarpa.gov
Link: https://www.iarpa.gov/Source snippet
Intelligence Advanced Research projects Activity...IARPA invests in research programs to tackle some of the Intelligence Communi...
-
Source: researchgate.net
Link: https://www.researchgate.net/figure/Mean-standardized-Brier-scores-for-superforecasters-Supers-and-the-two-comparison_fig1_277087515Source snippet
hich reality is coded as 1 for the event and 0 otherwise), ranging from 0 (...Read more...
-
Source: researchgate.net
Link: https://www.researchgate.net/publication/274992096_Forecasting_Tournaments_Tools_for_Increasing_Transparency_and_Improving_the_Quality_of_DebateSource snippet
s on the feasibility of improving judgmental accuracy and on the best methods...Read more...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/about/Source snippet
asting questions across the political, economic and social spectrum.Read more...
-
Source: goodjudgment.com
Link: https://goodjudgment.com/resources/the-superforecasters-track-record/the-first-championship-season/Source snippet
ontrol group by more than 50%. This is the largest...Read more...
-
Source: goodjudgment.com
Title: They can forecast outcomes 300 days prior to resolution better than their
Link: https://goodjudgment.com/superforecasters-still-creme-de-la-creme-six-years-on/Source snippet
Superforecasters: Still Crème de la Crème Six Years OnSuperforecasters are significantly more accurate than their forecasting peers...
-
Source: arxiv.org
Link: https://arxiv.org/pdf/2602.19520Source snippet
Domain-Specific Calibration Dynamics in Prediction Marketsby NA Le · 2026 · Cited by 2 — Tetlock and Gardner [65] demonstrated that struc...
-
Source: arxiv.org
Link: https://arxiv.org/html/2602.19520v1Source snippet
Decomposing Crowd Wisdom: Domain-Specific Calibration...23 Feb 2026 — structured forecasting tournaments can identify “superforecasters”...
-
Source: youtube.com
Title: Engaging Minds with Philip Tetlock and Barbara Mellers in New York City,
Link: https://www.youtube.com/watch?v=cLg8AdJG1v8Source snippet
The Good Judgment Project - Know It ALL...
Published: December 3, 2011
-
Source: youtube.com
Title: The Good Judgment Project
Link: https://www.youtube.com/watch?v=9yT9V-LvWdASource snippet
VN1 Forecasting Competition: Winning Solutions (AI Forecasting Academy)...
-
Source: faculty.wharton.upenn.edu
Title: 2015 superforecasters
Link: https://faculty.wharton.upenn.edu/wp-content/uploads/2015/07/2015—superforecasters.pdfSource snippet
Wharton Faculty Platform2015---superforecasters.pdf - Wharton Faculty Platformby B Mellers · 2015 · Cited by 323 — Brier scores are the a...
-
Source: emergentmind.com
Link: https://www.emergentmind.com/topics/superforecastersSource snippet
Metrics and Methods20 Feb 2026 — They use granular Bayesian reasoning, problem decomposition, and continuous recalibration to refine prob...
-
Source: Wikipedia
Title: The Good Judgment Project
Link: https://en.wikipedia.org/wiki/The_Good_Judgment_ProjectSource snippet
The Good Judgment ProjectPredictions are scored using Brier scores. The top forecasters in GJP are "reportedly 30% better than intelli...
-
Source: Wikipedia
Title: Intelligence Advanced Research Projects Activity
Link: https://en.wikipedia.org/wiki/Intelligence_Advanced_Research_Projects_ActivitySource snippet
Intelligence Advanced Research Projects ActivityIARPA funds academic and industry research across a broad range of technical areas, in...
-
Source: repository.upenn.edu
Title: eduexposure to similar vs
Link: https://repository.upenn.edu/bitstreams/988cd5cf-7bf6-479b-8d66-3cb7e6816168/downloadSource snippet
diverse perspectives in forecastingby K Chen · 2023 — Forecasting tournaments are competitions in which participants attempt to make the...
-
Source: repository.upenn.edu
Link: https://repository.upenn.edu/server/api/core/bitstreams/d182dd97-ce71-4d0e-9393-c44514f78036/contentSource snippet
Good Judgment Projectby G Forecasting · Cited by 1 — When scores are calculated to assess how correct the predictions were, there is a fu...
-
Source: dni.gov
Link: https://www.dni.gov/index.php/careers/special-programs/iarpaSource snippet
IARPA | Office of the Director of National IntelligenceIARPA is capable of quickly responding to new priorities, emerging challenges, sci...
-
Source: linkedin.com
Link: https://www.linkedin.com/company/iarpa-odni
Additional References
-
Source: andrewclark.co.uk
Link: https://andrewclark.co.uk/all-media/superforecastingSource snippet
SuperForecastingThe Brier score is a way to measure how good your predictions are. It looks at both calibration (how accurate your predic...
-
Source: osf.io
Link: https://osf.io/download/n5czvSource snippet
ACE and HFC forecasting tournaments, the Brier score was the core metric. That score had to be adjusted to respect...Read more...
-
Source: youtube.com
Link: https://www.youtube.com/watch?v=dQKFaYofqGE -
Source: newsroom.haas.berkeley.edu
Title: harnessing the wisdom of the crowd to forecast world events
Link: https://newsroom.haas.berkeley.edu/harnessing-the-wisdom-of-the-crowd-to-forecast-world-events/Source snippet
Wisdom of the Crowd for Forecast Accuracy14 Jun 2017 — Prof. Don Moore found a way to dramatically improve forecast accuracy by training...
-
Source: forum.effectivealtruism.org
Title: two directions for research on forecasting and decision
Link: https://forum.effectivealtruism.org/posts/dsG5SYjhPqnxhystM/two-directions-for-research-on-forecasting-and-decisionSource snippet
directions for research on forecasting and decision...11 Mar 2023 — Forecasting tournaments have shown that a set of methods for good ju...
-
Source: forum.effectivealtruism.org
Title: evidence on good forecasting practices from the good 1
Link: https://forum.effectivealtruism.org/posts/W94KjunX3hXAtZvXJ/evidence-on-good-forecasting-practices-from-the-good-1Source snippet
on good forecasting practices from the...15 Feb 2019 — For superforecasters, rounding to the nearest 0.10 produced significantly worse B...
-
Source: lifeitself.org
Link: https://lifeitself.org/blog/notes-on-tetlock-and-gardners-superforecastingSource snippet
Superforecasting, Tetlock and Gardner (Notes)Brier score = Sum of square error between prediction probability and actual outcome (e.g...
-
Source: arbresearch.com
Link: https://arbresearch.com/files/comparing_forecasters.pdfSource snippet
ment is 0.52 (SD: 0.11).” No better than predicting 50% on all...Read more...
-
Source: reddit.com
Link: https://www.reddit.com/r/ObscurePatentDangers/comments/1mwm9p4/biometric_recognition_and_identification_at/Source snippet
veloped by IARPA, aims to enhance the U.S. Intelligence Community's...
-
Source: commoncog.com
Title: how do you evaluate your own predictions
Link: https://commoncog.com/how-do-you-evaluate-your-own-predictions/Source snippet
?17 Dec 2019 — This post provides a comprehensive summary of the technique that Tetlock and Gardner presents in Superforecasting.Read more...
Topic Tree



