Within Sharper Thinking

How Feedback Makes Judgement Sharper

Feedback helps you learn whether you were right for the reasons you thought, not merely lucky.

On this page

  • Good feedback versus noisy outcomes
  • Decision records and later reviews
  • Calibration habits for real life
Preview for How Feedback Makes Judgement Sharper

Introduction

Feedback makes judgement sharper when it shows whether your reasoning, confidence and assumptions matched reality, not merely whether you got lucky. A good feedback loop therefore has three parts: a clear prediction made before the outcome is known, a record of why you believed it, and a later review that separates decision quality from outcome quality. This matters because many real-life judgements are made under uncertainty: a good decision can fail, a poor decision can work, and hindsight can make both feel more obvious than they were. Research on expert intuition, forecasting tournaments, metacognition and outcome bias points to the same practical lesson: people improve judgement fastest when feedback is timely, specific, repeated and tied to the reasoning available at the time. [PubMed+2Warrington College of Business]pubmed.ncbi.nlm.nih.govOpen source on nih.gov.

Overview image for Feedback The aim is not to become someone who is always right. It is to become someone whose confidence better tracks reality: less certain when the evidence is thin, more decisive when repeated evidence supports a pattern, and more willing to update when a trusted feedback loop shows that a belief was poorly grounded.

Why feedback calibrates judgement rather than just reporting results

A result tells you what happened. Feedback tells you what, if anything, should change in your thinking. That distinction is easy to miss because outcomes are emotionally loud. A project succeeds, so the plan feels wise. A hire fails, so the interview process feels foolish. An investment rises, so the thesis feels confirmed. Yet the same reasoning can lead to different results because luck, timing, hidden variables and changing conditions all sit between judgement and outcome.

This is why feedback loops are especially important for improving analytical skills. They create a bridge between present reasoning and future evidence. Daniel Kahneman and Gary Klein’s work on expert intuition argues that intuitive expertise is most trustworthy in environments with valid cues and timely, clear feedback; in weak-feedback environments, confidence can grow without corresponding skill. Firefighters, chess players and some clinicians can often learn from repeated cue-outcome patterns. Long-range political, investment or strategic forecasts are harder because the signal is delayed, noisy and partly unrepeatable. [PubMed]pubmed.ncbi.nlm.nih.govOpen source on nih.gov.

Good calibration means your stated confidence matches your actual hit rate. If you say “70% likely” across many comparable judgements, roughly seven in ten should turn out true. That does not make any single judgement safe, but it makes your confidence usable. It lets you distinguish “I am uncertain because the world is uncertain” from “I am uncertain because I have not thought clearly enough”.

Good feedback versus noisy outcomes

The central danger is mistaking outcome feedback for judgement feedback. Outcome feedback is “the thing worked” or “the thing failed”. Judgement feedback asks a harder question: given what was knowable at the time, was the reasoning sound, were the assumptions explicit, and was the confidence level appropriate?

Outcome bias is the tendency to evaluate a decision mainly by how it turned out rather than by the quality of the decision process. Baron and Hershey’s classic 1988 work showed that people’s evaluations of decision quality are influenced by outcomes even when those outcomes should not change the assessment of the original reasoning. Later replication work has continued to treat this as a live and important problem in decision evaluation. [Warrington College of Business]bear.warrington.ufl.edubaron hershey jpsp1988baron hershey jpsp1988

Hindsight bias compounds the problem. Once the outcome is known, people often overestimate how predictable it was, selectively recall evidence that fits the outcome, and impose a cleaner story on messy events. A review by Roese and Vohs describes hindsight bias as arising from cognitive, metacognitive and motivational sources: people recall outcome-consistent information, confuse ease of explanation with prior likelihood, and prefer a world that feels orderly and blameable. [Carlson School of Management]carlsonschool.umn.eduvohs et al 2012 hindsight biasvohs et al 2012 hindsight bias

A practical feedback loop therefore has to protect the pre-outcome view before the outcome contaminates it. Useful feedback looks like this:

  • Specific: “I expected supplier approval by 30 April with 80% confidence because the legal review had already cleared two similar contracts.”
  • Comparable: the same kind of judgement is made repeatedly, so patterns can be seen.
  • Timed: the review happens when the outcome is known but while the original reasoning is still available.
  • Diagnostic: the review asks which assumption failed, not just who was right.
  • Humble about noise: the loop treats one result as a clue, not a verdict.

A single outcome can teach something, but it rarely teaches everything. Five similar decisions reviewed against their original assumptions are much more informative than one vivid success or failure.

Feedback illustration 1

Decision records make hindsight less powerful

A decision record is a small written snapshot of your thinking before reality gives you the answer. It does not need to be elaborate. Its purpose is to stop your future self rewriting the past.

For a meaningful decision, record five things:

  1. The decision or forecast: what you are choosing or predicting.
  2. The confidence level: preferably as a probability or range.
  3. The main reasons: the evidence and assumptions driving the judgement.
  4. The live alternatives: what else could plausibly happen.
  5. The review date or trigger: when you will revisit the judgement.

This kind of record matters because memory is not a neutral archive. Without a record, a person who was 55% confident may later remember being “basically sure”; a person who ignored a risk may later remember having “flagged it all along”. Written forecasts and assumptions give later feedback something firm to compare against.

Forecasting tournaments show the value of making judgement measurable. The Intelligence Advanced Research Projects Activity’s ACE programme was created to improve the accuracy, precision and timeliness of intelligence forecasts by eliciting, weighting and combining judgements. In the Good Judgment Project, researchers asked forecasters to make probabilistic predictions on real geopolitical questions and scored them after resolution. The project found that probability training, collaboration and tracking high performers improved both calibration and resolution, showing that behavioural interventions can improve forecasting rather than merely selecting people who are already good at it. [iarpa.gov]iarpa.govOpen source on iarpa.gov.

The lesson for ordinary decisions is not that every life choice should become a formal forecasting tournament. It is that judgement improves when predictions are explicit enough to be scored. “This launch feels promising” is hard to learn from. “I think there is a 65% chance we reach 1,000 active users within three months, mainly because the waitlist conversion rate has stayed above 20%” gives the review something to test.

Calibration needs both confidence and discrimination

Being calibrated is not the same as being timid. A person who says “50%” about everything may avoid overconfidence, but they are not showing useful judgement. Good judgement requires both calibration and discrimination. Calibration asks whether confidence matches accuracy. Discrimination asks whether you can tell easier cases from harder ones and stronger evidence from weaker evidence.

This distinction is visible in forecasting scores. The Brier score, widely used for probabilistic forecasts, penalises the squared difference between a predicted probability and the outcome. Lower scores mean better forecasts, but the score reflects more than simple confidence matching; recent discussions stress that Brier scores also depend on the difficulty and distribution of the events being predicted. [PMC]pmc.ncbi.nlm.nih.govOpen source on nih.gov.

Good feedback should therefore ask two questions after a batch of judgements:

  • Were you overconfident or underconfident? For example, did your 80% predictions happen about 80% of the time?
  • Did higher-confidence judgements beat lower-confidence ones? If your 80% calls were no more accurate than your 55% calls, your confidence scale is not yet carrying useful information.

This is where many self-improvement efforts fail. People review only their mistakes, which teaches them to avoid embarrassment, or only their wins, which reinforces a flattering story. Calibration requires reviewing the whole set: correct high-confidence calls, wrong high-confidence calls, correct low-confidence calls and wrong low-confidence calls.

Feedback works best when it is close to the reasoning

The shorter the gap between judgement and feedback, the easier it is to connect cause and correction. In learning research, feedback can help people correct metacognitive errors: for example, Butler and colleagues found that feedback was especially useful when people gave correct answers with low confidence, because it corrected their mistaken sense that they had not known the answer. [Psychnet]psychnet.wustl.eduButler et al 2008 JEPLMCButler et al 2008 JEPLMC

But feedback is not magic. Some studies find mixed effects depending on the task, the type of feedback and whether people know how to use it. Recent work on calibration training using practical scoring rules found no improvement from the tested training regimes, while research in educational settings suggests that calibration can improve when feedback is combined with self-regulated learning training, peer evaluation or repeated opportunities to adjust judgement. [Wiley Online Library]onlinelibrary.wiley.comOpen source on wiley.com.

That mixed evidence is useful, not discouraging. It means the feedback loop must be designed, not merely added. “You were wrong” is a weak teacher. “You were wrong because you treated a small sample as representative, ignored a base rate, and gave 85% confidence where your evidence supported 60%” is much stronger.

For everyday analytical improvement, the best feedback is usually:

  • Fast enough that the original reasoning is still accessible.
  • Structured enough to identify the assumption being tested.
  • Repeated enough to reveal patterns rather than anecdotes.
  • Low-ego enough that the person can update without treating every error as a character judgement.

Feedback illustration 2

Real-life calibration habits

Most people do not need a complex system. They need a few repeatable habits that turn ordinary decisions into learning material.

Use probability language for uncertain claims

Replace “probably”, “unlikely”, “soon” and “high risk” with numbers or ranges when the decision matters. “Probably” might mean 55% to one person and 85% to another. A number forces a clearer commitment and makes later review possible.

This habit is central to forecasting practice. In the Good Judgment Project, forecasters made probabilistic estimates, updated them over time, and received scores once questions resolved. That structure turned judgement into a trainable skill rather than a one-off opinion. [learnmoore.org]learnmoore.orgMellers et al 2014Mellers et al 2014

Review batches, not just dramatic cases

A dramatic failure can be memorable but misleading. A batch of ten sales forecasts, hiring predictions, study estimates or project timelines gives a better view of whether your confidence is systematically high or low.

A useful monthly review asks:

  • When I was 70–80% confident, how often was I right?
  • Which types of decision produced the biggest confidence errors?
  • Did I update when new evidence arrived, or defend my first view?
  • Did I confuse a good outcome with a good process?

Separate process review from blame

A good feedback loop is not a courtroom. Its purpose is to improve future judgement. If every review becomes a search for fault, people hide uncertainty, avoid explicit predictions and rewrite their reasoning defensively.

This is why premortems and postmortems play different roles. A premortem, popularised by Gary Klein, asks a team to imagine that a project has failed and work backwards to identify possible causes before the decision is locked in. A postmortem reviews what actually happened. The first improves the decision before reality tests it; the second improves the decision process after reality has spoken. [Harvard Business Review]hbr.orgHarvard Business Review Performing a Project PremortemHarvard Business Review Performing a Project Premortem

Track the assumptions that mattered most

Not every detail deserves review. The highest-value feedback usually concerns the assumption that would have changed the decision if it had been known. In a job move, that might be the manager’s support. In a product launch, it might be the conversion rate. In an investment, it might be revenue quality rather than headline growth.

A good review identifies whether the assumption was wrong, unknowable, poorly weighted or simply swamped by luck. Those are different lessons. “The assumption was unknowable” may call for smaller bets. “The assumption was knowable but ignored” calls for better evidence gathering.

When feedback loops mislead

Feedback loops can sharpen judgement, but bad feedback loops can make it worse. The most common failure is selective feedback: you see only the outcomes that your earlier decision allowed you to see. A bank that denies loans cannot directly observe whether rejected applicants would have repaid. A manager who interviews only familiar candidates receives little feedback on the people never considered. Algorithmic decision-making research treats this as a serious problem because prior decisions can shape the data available for future decisions, reinforcing bias or narrowing what the system learns. [arXiv]arxiv.orgOpen source on arxiv.org.

Another failure is social feedback that rewards confidence rather than accuracy. People who sound certain often gain attention, authority or promotion before their claims can be tested. The Good Judgment Project is interesting partly because it used scoring and resolution rather than charisma as the basis for feedback. Good forecasters did not need to sound dramatic; they needed to keep their probabilities aligned with evidence. [Sage Journals]journals.sagepub.comSage Journals Forecasting TournamentsSage Journals Forecasting Tournaments

A third failure is feedback without a comparison class. “This project took six months” is less useful than “we estimated three months; comparable projects have taken five to seven; our main error was ignoring legal review time”. Calibration improves when feedback is anchored to similar cases, not isolated impressions.

Feedback illustration 3

A simple review template for sharper judgement

A practical calibration loop can fit on one page. Use it for decisions that are important, uncertain and repeatable enough to learn from.

Before the outcome

  • What am I predicting or deciding?
  • What probability or confidence level would I assign?
  • What are my three main reasons?
  • What evidence would change my mind?
  • What is the most important assumption?
  • What alternative outcome would not surprise me?

After the outcome

  • What happened?
  • Was I right for the reason I expected, or right by accident?
  • If I was wrong, was the error in evidence, weighting, timing, incentives or luck?
  • Was my confidence too high, too low or about right?
  • What rule of thumb should I update for next time?

The most important question is often: “Was I right for the reasons I thought?” That question stops a lucky win from hardening into false confidence and stops an unlucky loss from destroying a sound process.

The payoff: confidence that earns its strength

Feedback loops calibrate judgement by turning experience into evidence. Without them, experience can simply deepen habits: the confident become more confident, the lucky mistake luck for skill, and the unlucky abandon good reasoning too quickly. With them, decisions become testable, confidence becomes measurable, and mistakes become more specific.

The wider goal of improving thinking and analytical skills is not to remove uncertainty. It is to behave better inside uncertainty. A calibrated thinker can say, “I was 70% confident and this was the 30% case,” without denial. They can also say, “I was 90% confident and wrong; my model is broken,” without defensiveness. That combination of explicit prediction, honest review and adjusted confidence is what makes feedback a mechanism for sharper judgement rather than a record of wins and losses.

Amazon book picks

Further Reading

Books and field guides related to How Feedback Makes Judgement Sharper. Use these as the next step if you want deeper reading beyond the article.

eBay marketplace picks

Marketplace Samples

Live-tested eBay searches with available results related to this page.

Using USA

Endnotes

  1. Source: learnmoore.org
    Title: Mellers et al 2014
    Link: https://learnmoore.org/papers/Mellers%20et%20al%202014.pdf

  2. Source: iarpa.gov
    Link: https://www.iarpa.gov/research-programs/ace

  3. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12818272/

  4. Source: onlinelibrary.wiley.com
    Link: https://onlinelibrary.wiley.com/doi/full/10.1002/ffo2.199

  5. Source: arxiv.org
    Link: https://arxiv.org/abs/2305.06055

  6. Source: arxiv.org
    Title: arXiv Fairness under uncertainty in sequential decisions
    Link: https://arxiv.org/abs/2604.21711

  7. Source: pubmed.ncbi.nlm.nih.gov
    Link: https://pubmed.ncbi.nlm.nih.gov/19739881/

  8. Source: bear.warrington.ufl.edu
    Title: baron hershey jpsp1988
    Link: https://bear.warrington.ufl.edu/brenner/mar7588/Papers/baron-hershey-jpsp1988.pdf

  9. Source: carlsonschool.umn.edu
    Title: vohs et al 2012 hindsight bias
    Link: https://carlsonschool.umn.edu/sites/carlsonschool.umn.edu/files/2026-01/vohs-et-al-2012-hindsight-bias.pdf

  10. Source: psychnet.wustl.edu
    Title: Butler et al 2008 JEPLMC
    Link: https://psychnet.wustl.edu/memory/wp-content/uploads/2018/04/Butler-et-al-2008_JEPLMC.pdf

  11. Source: hbr.org
    Title: Harvard Business Review Performing a Project Premortem
    Link: https://hbr.org/2007/09/performing-a-project-premortem

  12. Source: journals.sagepub.com
    Title: Sage Journals Forecasting Tournaments
    Link: https://journals.sagepub.com/doi/10.1177/0963721414534257

  13. Source: research-collection.ethz.ch
    Link: https://www.research-collection.ethz.ch/server/api/core/bitstreams/5cc66b9a-8013-499b-90f9-7aeba05c0195/content

  14. Source: pubmed.ncbi.nlm.nih.gov
    Link: https://pubmed.ncbi.nlm.nih.gov/41567946/

  15. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12523994/

  16. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC11281873/

  17. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC8763848/

  18. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12372742/

  19. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC6824411/

  20. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC7333631/

  21. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10189590/

  22. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10912288/

  23. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10927782/

  24. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC10790535/

  25. Source: Wikipedia
    Title: Brier score
    Link: https://en.wikipedia.org/wiki/Brier_score

  26. Source: Wikipedia
    Title: The Good Judgment Project
    Link: https://en.wikipedia.org/wiki/The_Good_Judgment_Project

  27. Source: journals.sagepub.com
    Link: https://journals.sagepub.com/doi/10.1177/2755323X251357643

  28. Source: journals.sagepub.com
    Link: https://journals.sagepub.com/doi/10.1177/01492063241287188

  29. Source: journals.sagepub.com
    Link: https://journals.sagepub.com/doi/abs/10.3102/00346543221094083

  30. Source: cmu.edu
    Title: Outcome Feedback
    Link: https://www.cmu.edu/dietrich/sds/docs/loewenstein/OutcomeFeedback.pdf

  31. Source: profiles.wustl.edu
    Title: psychological strategies for winning a geopolitical forecasting t
    Link: https://profiles.wustl.edu/en/publications/psychological-strategies-for-winning-a-geopolitical-forecasting-t/

  32. Source: frontiersin.org
    Link: https://www.frontiersin.org/journals/applied-mathematics-and-statistics/articles/10.3389/fams.2021.669546/full

  33. Source: dataopsschool.com
    Title: brier score
    Link: https://dataopsschool.com/blog/brier-score/

Additional References

  1. Source: youtube.com
    Title: Ep. 234: Dr. Gary Klein
    Link: https://www.youtube.com/watch?v=xslxJpbUo-s
    Source snippet

    Expert Political Judgment: How Good Is It? How Can We Know? | Philip Tetlock...

  2. Source: youtube.com
    Title: Why “scout mindset” is crucial to good judgment | Julia Galef | TEDx PSU
    Link: https://www.youtube.com/watch?v=3MYEtQ5Zdn8
    Source snippet

    Know Your Own Future (at least 10% better)...

  3. Source: youtube.com
    Title: Know Your Own Future (at least 10% better)
    Link: https://www.youtube.com/watch?v=mEFWad6S6iY
    Source snippet

    Why Predictions Fail | Use Probabilities Instead of Certainty...

  4. Source: researchgate.net
    Link: https://www.researchgate.net/publication/26798603_Conditions_for_Intuitive_Expertise

  5. Source: researchgate.net
    Link: https://www.researchgate.net/publication/274992096_Forecasting_Tournaments_Tools_for_Increasing_Transparency_and_Improving_the_Quality_of_Debate

  6. Source: researchgate.net
    Link: https://www.researchgate.net/publication/257671125_Metacognitive_scaffolds_improve_self-judgments_of_accuracy_in_a_medical_intelligent_tutoring_system

  7. Source: researchgate.net
    Link: https://www.researchgate.net/publication/375085139_A_Classification_of_Feedback_Loops_and_Their_Relation_to_Biases_in_Automated_Decision-Making_Systems

  8. Source: medium.com
    Link: https://medium.com/%40cartelgouabou/enhancing-medical-predictions-a-comprehensive-guide-to-model-calibration-3ea741be88d7

  9. Source: eventhorizonstrategies.com
    Link: https://eventhorizonstrategies.com/wp-content/uploads/2021/01/Bringing-Probability-Judgments-into-Policy-Debates-via-Forecasting-Tournaments_Science-Journals-%E2%80%94-AAAS.pdf

  10. Source: alliancefordecisioneducation.org
    Link: https://alliancefordecisioneducation.org/resources/conducting-a-pre-mortem/

Topic Tree

Follow this branch

Parent topic

Sharper Thinking

Related pages 29

More on this topic 6