Skip to content
 

The winner’s curse

If an estimate is statistically significant, it’s probably an overestimate of the magnitude of your effect.

P.S. I think youall know what I mean here. But could someone rephrase it in a more pithy manner? I’d like to include it in our statistical lexicon.

21 Comments

  1. K? O'Rourke says:

    Andrew: Are you concerned about the damage that will be done by those that don't get this?
    (hard to imagine – isn't it)

    http://jama.ama-assn.org/cgi/content/extract/304/

    Some upcomming debate about this at the next Cochrane Collaboration – but could not find the link… (Saturday night and I am behind on making dinner)

    K?

  2. OneEyedMan says:

    The curse of multiple comparisons

    The tallest pygmy problem

    The problem of infinite monkey testing

    Rain dance testing

    Cargo cult testing

  3. Laurence says:

    I can't quite get this right, but something like: "If p is too small for not our effect to be true, our b is probably too big."

  4. The Science Pundit says:

    I guess you're looking for something like "If it sounds too good to be true, then it probably is" or "Anything that can go wrong will go wrong." Well, you could always just use those as models. How about?

    Statistically significant estimates are usually overestimates.

  5. Bayesian Empirimance says:

    As anyone who has ever had a heated political discussion knows, bias and confidence go hand in hand.

  6. Charles says:

    Maybe building on science pundit:

    Significant estimates overestimate.

  7. Jonathan says:

    I always liked: "If you can't get t-statistic of 3, you either aren't trying or you're wrong."

  8. E. says:

    A pithier version of Laurence, above, might be:

    If p is small enough, b is probably too big.

  9. K? O'Rourke says:

    Casella's original term _recognizable set_ was to the point – was it not?

    K?

  10. Dan Goldstein says:

    This seems related to the Proteus Phenomenon

    http://clinicaltrials.ploshubs.org/article/info:d

    "In the Proteus phenomenon, the first published study on a scientific question may find a most extravagant effect size; this is followed by the publication of another study that shows a large contradicting effect. Subsequent studies report effect sizes between these extremes"

    See also
    http://www.jclinepi.com/article/S0895-4356%2805%2

  11. Adam says:

    Andrew,

    Can I be the stupid one and ask what, exactly, you do mean here?

  12. Andrew Gelman says:

    Adam:

    I'm talking about Type M errors. See here.

  13. Jerzy says:

    I really like that paper on sex ratios. It's good to be reminded to always check previous research, and see what effect sizes and levels of variation/precision can be expected, before you try to make sense of a new claimed result.

    Here's a paraphrase from the paper that may or may not be pithy enough for the lexicon:

    Large estimates often do not mean "Wow, I’ve found something big!" but, rather, "Wow, this study is underpowered!"

  14. Jerzy says:

    But it's subtly different from the Winner's Curse in economics, right?
    http://econ.ucdenver.edu/beckman/Econ%204001/thal
    There, you're cursing the fact that you spent too many resources to win a competition against other people in order to win a prize that's worth less than you had thought.
    Here, you're cursing that you spent too few resources (i.e. had too small a sample) to know whether you can trust your "significant" effect. (Except that in many cases, you're cheering that you can slip this problem past reviewers, and then rigorous-minded people like Andrew are left cursing instead of you.)

  15. anon says:

    "Statistically significant estimates are usually overestimates."

    Since, on the whole, the residuals sum to zero, then statistically insignificant estimates are usually underestimates.

  16. zbicyclist says:

    Winner's mirage?

    "Effects seen in the significance lens are smaller than they appear"

  17. K? O'Rourke says:

    anon: Yes! but what percentage of people get this on their own, versus after it has been pointed out once versus multiple, multiple times.

    Also the reason as Andrew once put it you need to keep one eye on the power – the less power in the studies the larger those two cancelling _biases_ become.

    (Even dressing it up in a topic of "beauty and sex" may not be enough)

    K?

  18. K? O'Rourke says:

    The scandal of (low) power

    "Effects seen in the significance lens are (much) smaller than they appear"

    "Effects seen in the non-significance lens are (much) larger than they appear"

    K?

  19. Joseph says:

    In my mind, one of the issues is what is publishable (and how that shapes the literature). In epidemiology, an unexpected large association is immediately publishable whereas many null studies are extremely hard to publish. If power is low (due to a rare outcome and the difficulties in getting data) then the published associations are almost certianly dramatic over-estimates.

    It's not an easy problem as (for example) ignoring a safety signal sometimes leads to very unfortunate outcomes.

  20. wei says:

    i am curious about the fact that the sentence starts with significance testing and ends with effect estimation. Does that mean we can avoid the problem by avoid testing (but still screening a lot of effects by effect estimation)?

  21. thom says:

    What the about the "filter fallacy"? People pass studies through a filter that excludes small effects (fixed p, fixed n) and are then surprised that you've overestimated the effects …

    … or the "too big to be true" effect.

    There is also the "file drawer problem" in which excluding ns effects biases published study effects upwards.