Skip to content
 

“The Pitfall of Experimenting on the Web: How Unattended Selective Attrition Leads to Surprising (Yet False) Research Conclusions”

Kevin Lewis points us to this paper by Haotian Zhou and Ayelet Fishbach, which begins:

The authors find that experimental studies using online samples (e.g., MTurk) often violate the assumption of random assignment, because participant attrition—quitting a study before completing it and getting paid—is not only prevalent, but also varies systemically across experimental conditions. Using standard social psychology paradigms (e.g., ego-depletion, construal level), they observed attrition rates ranging from 30% to 50% (Study 1). The authors show that failing to attend to attrition rates in online panels has grave consequences. By introducing experimental confounds, unattended attrition misled them to draw mind-boggling yet false conclusions: that recalling a few happy events is considerably more effortful than recalling many happy events, and that imagining applying eyeliner leads to weight loss (Study 2). In addition, attrition rate misled them to draw a logical yet false conclusion: that explaining one’s view on gun rights decreases progun sentiment (Study 3). The authors offer a partial remedy (Study 4) and call for minimizing and reporting experimental attrition in studies conducted on the Web.

I started to read this but my attention wandered before I got to the end; I was on the internet at the time and got distracted by a bunch of cat pictures, lol.

9 Comments

  1. Justin says:

    Does this change your mind about the dirty money paper? In your comments on that paper, you said ” the experiments are great. I love this stuff.” Yet, they were all done with small MTurk samples.

    http://andrewgelman.com/2016/12/23/dirty-money-role-moral-history-economic-judgments/

    The best evidence to date on the validity of experiments with online samples comes from a paper by Mullinix, Leeper, Druckman, and Freese (2015). Here are their key findings, as reported on page 122:

    “In sum, 29 (or 80.6%) of the 36 treatment effects in Figures 2 and 3 estimated from
    TESS are replicated by MTurk in the interpretation of the statistical significance
    and direction of treatment effects. Importantly, of the seven experiments for which
    there is a significant effect in one sample, but a null result in the other, only one
    (Experiment 20) actually produced a significantly different effect size estimate
    (Gelman and Stern 2006). Across all tests, in no instance did the two samples
    produce significantly distinguishable effects in substantively opposite directions.”

    All sorts of common techniques in experiments, such as dropping subjects who fail manipulation checks (Aronow, Baron, and Pinson, 2016) or including a non-randomized mediator (MacKinnon and Pirlott, 2015), turn true experiments into quasi-experiments by annulling random assignment, and thus can lead to biased findings. Even if the assumptions of random assignment hold, many vignette experiments (online or offline) are confounded because the manipulation effects the error term (Dafoe, Zhang, Caughey, 2015).

  2. Justin says:

    One more follow-up. The attrition rates reported in the Zhou and Fishbach (2016) article (of 20-50%) seem incredible to me. Maybe they are limited to certain types of experiment. I’ve only done a few experiments with MTurk, but I have never seen an attrition rate over 3%. Here are the data from the experiments I’ve conducted (all but one are published), where the initial sample consists of those who proceeded past the introductory (or consent) page, and “finished” means answering the last demographic questions:

    Experiment A: 1,015 workers started and 1,004 (or 99%) finished.

    Experiment B: 629 workers started and 623 (99%) finished.

    Experiment C: 1,012 workers started and 1,001 (99%) finished.

    Experiment D: 506 workers started and 496 (98%) finished.

    These were all embedded in surveys that took 8+ minutes to complete.

    • elin says:

      And you didn’t use one of the programs that doesn’t tell you about drop outs?

      • Justin says:

        All surveys were housed on surveymonkey and set up so that respondents had to answer a question to proceed past intro/consent page. It is easy to see how many respondents moved forward to the survey content and experiment. I closely follow the MTurk literature and have never heard of high drop off rates.

        • Elin says:

          Any open ended questions? It seems like those are the ones the paper highlights. I really disbelieve the time estimates they gave to people (5 minutes) if they were asking for open ended responses.

  3. Brendan Nyhan says:

    Agree with Justin – I’ve never seen attrition rates like that myself. The authors might have bait-and-switched Turkers on length/topic/etc. or had complicated psych manipulations that caused dropout or something. The article is useful in bringing attention to the bad defaults in Qualtrics, though, and to the problem of post-treatment bias in experiments (for more on the latter, see my new paper w/Montgomery and Torres: http://www.dartmouth.edu/~nyhan/post-treatment-bias.pdf).

  4. Jonathan says:

    You should design models that includes dimensions for attrition. That’s the difference between much drug testing and real world use, for example. As my kid tells me, she’s written out requirements for participation in a protocol and while one can understand trying to find an effect in “pure” or at least less confounded form that also means you aren’t measuring an effect in the actual confounded world. That’s a form of attrition carried into the post-study world but of course there’s also similar or the same forms of attrition in the study, as people drop out and develop issues, die, etc. and then you argue about why they died and whether that should be considered as having occurred outside the study, etc.

Leave a Reply