Skip to content

Use multilevel modeling to correct for the “winner’s curse” arising from selection of successful experimental results

John Snow writes:

I came across this blog by Milan Shen recently and thought you might find it interesting.

A couple of things jumped out at me. It seemed like the so-called ‘Winner’s Curse’ is just another way of describing the statistical significance filter. It also doesn’t look like their correction method is very effective. I’d be very curious to hear your take on this work, especially this idea of a ‘Winner’s Curse’. I suspect the airbnb team could benefit from reading some of your work when it comes to dealing with these problems!

My reply: Yes, indeed I’ve used the term “winner’s curse” in this context. Others have used the term too.

Also here.

Here’s a paper discussing the bias.

I think the right thing to do is fit a multilevel model.


  1. Blake Shurtz says:

    If you haven’t already, can you write a post about your multilevel model paper for an audience of undergrads? I’ve taken a lot of stats but not Bayes, yet. Just a suggestion, thanks!

  2. bxg says:

    The blog article is about AirBnB, and the first line is “Overview: In online experimentation platforms, we choose the experiments with significant successful results to launch to the product.” There and throughout the article, significance means statistical significance.

    So my question, is this really how sites like this use the results of A/B testing, and if so is (or under what situations is) the use of statistical significance economically rational? (Other than perhaps, and implausibly: “all tested interventions are expected to have about the same variability”)

    It’s not an obviously silly way of doing things. And AirBnB probably knows what they are doing!

    But we know that the actual effect sizes will be very important, so it would IMO be unconvincing and fairly empty to offer the argument (however true) that “these are the interventions we are most confident have some positive effect”. But perhaps there are other arguments?

    • Keith O'Rourke says:

      I did briefly talk to someone from a similar type of organisation and the neglect of fuller economic decision analysis seems largely due to logistics (making it work largely unsupervised in numerous analyses).

      However, they did not realize they could do occasional batch learning to pre-define better false negative/positive trade offs. Even switching to p < .07 or whatever could possibly improve the bottom line.

      Really statistics is mostly about economics of learning from observations – not just defaults. That seems to be the missing insight.

      Though defaults have their place last paragraph and the link in it here

      • bxg says:

        Thanks for the response, but it’s a bit unsatisfying… this is capitalism, and with business like AirBnB there is presumably _big_ money at stake. And they can pay for “talent” and any reasonable amount of research dollars are trivial to them. But I think you suggesting that some companies like this leave money on the table because no-one has enough economic sense to even think about 0.07 vs 0.05 (let alone larger procedural changes). Obviously possible, but isn’t this a bit implausible here and in 2018?

        There is just a crazy number of intelligent people turning to work to on-line advertisement optimization, or e-commerce conversion, but somehow they are all completely missing economics/accounting/finance 101? I’ve got to think that one’s very very strong prior should be: no, they aren’t. But then, back to my original question, when might a statistical significance filter be economically rational?

  3. Keith O'Rourke says:

    > very very strong prior should be: no, they aren’t.
    But as long as that’s less than one, it can be revised given observations to the contrary.

    > they can pay for “talent”
    They can but – they may not be able to discern, attract and manage it!

    > might a statistical significance filter be economically rational?
    Less dis-rational – there always exist constraints on what can practically implemented right now.

Leave a Reply