Within Evidence Tests

What Makes Evidence Actually Strong?

Not all evidence is equally diagnostic, and strong signals are the ones a false claim would struggle to explain.

On this page

  • Independence, specificity, and fair comparison
  • Why costly and repeatable signals matter
  • How to spot evidence that sounds stronger than it is
Preview for What Makes Evidence Actually Strong?

Introduction

Strong evidence is not simply evidence that is dramatic, persuasive or widely shared. It is evidence that would be difficult to observe if the claim were false. That is what makes it diagnostic: it genuinely changes how likely a claim is to be true rather than merely sounding convincing. In everyday decisions, people often treat anecdotes, graphs, expert opinions, testimonials and direct measurements as if they all carry equal weight. They do not. The most valuable signals are typically those that are independent of one another, specific enough to distinguish between competing explanations, costly or difficult to fake, and repeatable by others. These features reduce the chance that an apparent pattern is simply coincidence, bias or selective reporting. [NCBI+2Academy of Medical Sciences]ncbi.nlm.nih.govDiagnostic Testing Accuracy: Sensitivity, Specificity, Predictive…by J Shreffler · 2023 · Cited by 477 — Sensitivity does not allo…

Strong Signals illustration 1 Understanding these mechanisms helps you update your beliefs more accurately. Instead of asking, “Does this support my view?”, ask, “How surprising would this evidence be if my view were wrong?” That shift is at the heart of better analytical thinking.

Why some evidence changes your mind more than others

The key difference between weak and strong evidence is not certainty but discrimination. Strong evidence separates competing explanations.

Imagine someone claims a new productivity method doubles output. Several colleagues say they feel more productive after trying it. That is evidence, but it is compatible with many explanations: enthusiasm, placebo effects, selective memory or temporary motivation. Now imagine multiple independent organisations measure productivity before and after adoption using the same objective criteria, compare results with similar teams that did not adopt the method, and obtain similar improvements across different settings. That pattern is much harder to explain away if the claim is false.

A useful question is:

If the claim were false, how easily could I still observe this evidence?

If the answer is “quite easily”, the evidence deserves limited weight. If the answer is “only with great difficulty”, the evidence deserves much more.

This is the same logic behind likelihood ratios in diagnostic testing. A highly specific medical test is valuable because a positive result is uncommon when the disease is absent, making the observation genuinely informative rather than merely consistent with the diagnosis. [NCBI]ncbi.nlm.nih.govDiagnostic Testing Accuracy: Sensitivity, Specificity, Predictive…by J Shreffler · 2023 · Cited by 477 — Sensitivity does not allo…

Independence, specificity and fair comparison

Three characteristics repeatedly distinguish stronger evidence from weaker evidence.

Independent evidence

Evidence becomes much stronger when different observations fail or succeed independently.

Ten news articles repeating the same press release are effectively one piece of evidence. Ten laboratories independently reproducing the same experimental finding are far stronger because each provides a separate opportunity for the claim to fail.

The same principle applies outside science:

  • Multiple customer reviews copied from one marketing campaign provide little additional confidence.
  • Three independent experts reaching similar conclusions after separate analyses deserve much greater weight.
  • Several measurements collected using different methods reduce the chance that a single flaw explains the result.

Independence prevents the same information from being counted multiple times.

Specific evidence

Strong evidence distinguishes one explanation from another.

Suppose a weather forecast predicts:

  • “It might rain this week.”
  • “Rain will begin between 2 pm and 4 pm tomorrow, with around 10 mm falling.”

If the second prediction proves accurate, it is much more informative because it would have been difficult to guess correctly by chance.

Specific predictions expose themselves to failure. Vague predictions rarely do.

This is why scientific hypotheses become stronger when they make risky, precise predictions instead of broad statements compatible with almost any outcome.

Fair comparisons

Evidence is only meaningful when comparison groups are genuinely comparable.

If one school adopts a new teaching method and later performs better, the improvement might reflect the method—or differences in funding, student intake or teacher experience.

Better comparisons attempt to isolate the factor of interest by using control groups, matched comparisons or random assignment where practical. Systematic reviews and well-designed comparative studies generally provide stronger evidence because they reduce alternative explanations. [Digital Education Resource Archive]dera.ioe.ac.ukSeptember 14, 2017 — We recognise that in some areas where improvement is needed, the evidence base is weaker than others because it is e…Published: September 14, 2017

Strong Signals illustration 2

Why costly and repeatable signals matter

Some signals deserve more trust precisely because they are difficult to fake.

Economists sometimes describe these as costly signals: actions that require genuine commitment and therefore become more credible.

Examples include:

  • A manufacturer offering an unusually long warranty.
  • A company voluntarily publishing independent audit results.
  • Researchers sharing raw data and analysis code.
  • Experts making precise forecasts before outcomes are known.

Cheap signals can often be produced whether or not the underlying claim is true. Costly signals usually cannot.

Repeatability provides another layer of protection.

One surprising observation may simply be luck. Repeated observations under different conditions make coincidence progressively less plausible.

Scientific replication is built around this principle. An independent research group following similar methods should obtain broadly similar findings if the original effect is real. Exact numerical agreement is not expected in many fields, but consistent patterns across independent studies substantially increase confidence. [Academy of Medical Sciences]acmedsci.ac.ukIn many cases.Read more…

How to spot evidence that sounds stronger than it is

Many persuasive-looking signals have much lower diagnostic value than people assume.

Common examples include:

  • Many sources, one origin. Numerous articles all relying on the same unpublished claim create the appearance of consensus without providing independent confirmation.
  • Large numbers without comparison. “Cases increased by 50%” means little without knowing the starting level, comparison group or expected variation.
  • Expert opinion without evidence. Expertise matters, but unsupported opinions generally carry less weight than transparent evidence that others can inspect.
  • Success stories alone. Testimonials reveal that success is possible, not how likely it is. They rarely include comparable failures.
  • Single impressive studies. Initial findings frequently become smaller, disappear or require qualification when examined by multiple independent teams. [Journal of Ethics]journalofethics.ama-assn.orgJournal of Ethics When Research Evidence is MisleadingIn our era of soaring health care costs…Read more…

A useful habit is to ask what evidence is missing. If only successful examples are presented, where are the unsuccessful ones? If only averages are reported, what was the variation? If only one measurement exists, has anyone repeated it?

Strong Signals illustration 3

Practical questions that reveal genuinely strong signals

When assessing evidence, a small number of questions often reveal far more than technical details.

  • Would this observation be unlikely if the claim were false?
  • Is this evidence independent, or does everything trace back to the same source?
  • Could another explanation produce exactly the same pattern?
  • Has the result been observed repeatedly by different people or different methods?
  • Is there a fair comparison showing what happened without the claimed cause?
  • Does the claim make precise predictions that could genuinely fail?

These questions focus attention on the evidence’s ability to discriminate between competing explanations rather than its emotional impact.

A simple way to weigh competing evidence

Instead of counting pieces of evidence, weigh their diagnostic strength.

A rough hierarchy often looks like this:

  1. Independent, repeatable measurements with appropriate comparisons deserve the greatest weight because they eliminate many alternative explanations simultaneously.
  2. Multiple independent observations pointing in the same direction are usually stronger than any single observation.
  3. Expert judgement supported by transparent reasoning and evidence is generally more reliable than unsupported authority.
  4. Individual experiences and anecdotes remain useful for generating ideas and identifying possibilities but usually deserve less weight when estimating whether a claim is broadly true.

This hierarchy is not absolute. An anecdote may reveal something entirely new, while a poorly designed study may mislead. The central principle is always the same: give more weight to evidence that a false claim would struggle to explain, and less weight to evidence that could easily arise under many different explanations.

Amazon book picks

Further Reading

Books and field guides related to What Makes Evidence Actually Strong?. Use these as the next step if you want deeper reading beyond the article.

BookCover for Bad Science

Bad Science

By Ben Goldacre

Uses real-world examples to explain why good evidence requires fair comparisons, replication, and sound methodology.

eBay marketplace picks

Marketplace Samples

Live-tested eBay searches with available results related to this page.

Using USA

Endnotes

  1. Source: ncbi.nlm.nih.gov
    Link: https://www.ncbi.nlm.nih.gov/books/NBK557491/
    Source snippet

    Diagnostic Testing Accuracy: Sensitivity, Specificity, Predictive...by J Shreffler · 2023 · Cited by 477 — Sensitivity does not allo...

  2. Source: acmedsci.ac.uk
    Link: https://acmedsci.ac.uk/viewFile/56314e40aac61.pdf
    Source snippet

    In many cases.Read more...

  3. Source: dera.ioe.ac.uk
    Link: https://dera.ioe.ac.uk/id/eprint/30009/2/SSIF_Classification_of_Evidence_FINAL-1.pdf
    Source snippet

    September 14, 2017 — We recognise that in some areas where improvement is needed, the evidence base is weaker than others because it is e...

    Published: September 14, 2017

  4. Source: journalofethics.ama-assn.org
    Title: Journal of Ethics When Research Evidence is Misleading
    Link: https://journalofethics.ama-assn.org/article/when-research-evidence-misleading/2013-01
    Source snippet

    In our era of soaring health care costs...Read more...

  5. Source: pmc.ncbi.nlm.nih.gov
    Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC6558629/
    Source snippet

    Diagnostic Tests: A Review of Test Anatomy, Phases...by SD Bolboacă · 2019 · Cited by 119 — This article briefly reviews the steps in th...

Additional References

  1. Source: royalsocietypublishing.org
    Link: https://royalsocietypublishing.org/rsos/article/4/12/171085/93045/The-reproducibility-of-research-and-the
    Source snippet

    misinterpretation of p...by D Colquhoun · 2017 · Cited by 326 — We wish to answer this question: If you observe a 'significant' p-value...

  2. Source: nature.com
    Link: https://www.nature.com/articles/s41374-019-0257-2
    Source snippet

    False-positive pathology: improving reproducibility with the...by BL Mazer · 2019 · Cited by 23 — False positives in pathology affect pa...

  3. Source: youtube.com
    Title: Bayes’ rule: A powerful thinking paradigm | Julia Galef
    Link: https://www.youtube.com/watch?v=UrO__x4R-_M
    Source snippet

    Julia Galef: Think Rationally via Bayes' Rule | Big Think...

  4. Source: youtube.com
    Title: Likelihood Ratios and The Probability of Diagnosis
    Link: https://www.youtube.com/watch?v=LIDIw3ZAI2I
    Source snippet

    Bayes' rule: A powerful thinking paradigm | Julia Galef...

  5. Source: youtube.com
    Title: The Bayes Theorem: What Are the Odds?
    Link: https://www.youtube.com/watch?v=Ql2jEJ-6e-Y
    Source snippet

    Likelihood Ratios and The Probability of Diagnosis...

  6. Source: youtube.com
    Title: When should evidence change your mind?
    Link: https://www.youtube.com/watch?v=5ycjezG1gH0
    Source snippet

    The Bayes Theorem: What Are the Odds?...

  7. Source: youtube.com
    Title: Julia Galef: Think Rationally via Bayes’ Rule | Big Think
    Link: https://www.youtube.com/watch?v=NEqHML98RgU

  8. Source: arxiv.org
    Link: https://arxiv.org/html/2605.17273
    Source snippet

    Models are compared...Read more...

Topic Tree

Follow this branch

Parent topic

Evidence Tests What Evidence Would Change Your Mind?

Related pages 5