Skip to content
 

50 shades of gray: A research story

This is a killer story (from Brian Nosek, Jeffrey Spies, and Matt Motyl).

Part 1:

Two of the present authors, Motyl and Nosek, share interests in political ideology. We were inspired by the fast growing literature on embodiment that demonstrates surprising links between body and mind (Markman & Brendl, 2005; Proffitt, 2006) to investigate embodiment of political extremism. Participants from the political left, right and center (N = 1,979) completed a perceptual judgment task in which words were presented in different shades of gray. Participants had to click along a gradient representing grays from near black to near white to select a shade that matched the shade of the word. We calculated accuracy: How close to the actual shade did participants get? The results were stunning. Moderates perceived the shades of gray more accurately than extremists on the left and right (p = .01). Our conclusion: political extremists perceive the world in black-and-white, figuratively and literally. Our design and follow-up analyses ruled out obvious alternative explanations such as time spent on task and a tendency to select extreme responses. Enthused about the result, we identified Psychological Science as our fall back journal after we toured the Science, Nature, and PNAS rejection mills. The ultimate publication, Motyl and Nosek (2012) served as one of Motyl’s signature publications as he finished graduate school and entered the job market.

Part 2:

The story is all true, except for the last sentence; we did not publish the finding. Before writing and submitting, we paused. Two recent papers highlighted the possibility that research practices spuriously inflate the presence of positive results in the published literature (John, Loewenstein, & Prelec, 2012; Simmons, Nelson, & Simonsohn, 2011). Surely ours was not a case to worry about. We had hypothesized it, the effect was reliable. But, we had been discussing reproducibility, and we had declared to our lab mates the importance of replication for increasing certainty of research results. We also had an unusual laboratory situation. For studies that could be run through a web browser, data collection was very easy (Nosek et al., 2007). We could not justify skipping replication on the grounds of feasibility or resource constraints. Finally, the procedure had been created by someone else for another purpose, and we had not laid out our analysis strategy in advance. We could have made analysis decisions that increased the likelihood of obtaining results aligned with our hypothesis. These reasons made it difficult to avoid doing a replication. We conducted a direct replication while we prepared the manuscript. We ran 1,300 participants, giving us .995 power to detect an effect of the original effect size at alpha = .05.

Part 3:

The effect vanished (p = .59).

Their paper is all about how to provide incentives for this sort of good behavior, in contrast to the ample incentives that researchers have to publish their tentative findings attached to grandiose claims.

P.S. I wrote this a couple months ago (as regular readers know, most of the posts on this blog are on a delay of two months or so) but now that it appears I realize it is very relevant to our discussion of statistical significance from the other day.

22 Comments

  1. Ashok Rao says:

    Disclosure, I didn’t read the whole paper yet, but I grepped it for “bet” and didn’t find any hits. It looks like a great one though; like the Tukey quote.

    What if each paper was published with a bet that authors place on the positive association? Then haters could go and replicate to falsify the bet. There’s a whole mega-host of problems associated with this (http://ashokarao.com/2013/07/04/on-bets/) – but this seems like a pretty good way to discern papers where authors believe what they publish from, well, that where the “ample incentives” dominate.

    Simple practice: each co-author, on the title page, inserts his odds parenthetically.

    I’m not saying there’s no value in results where the author doesn’t have confidence (though it’d be nice to know what exactly it is). This practice would encourage a focus on what might actually be important for such papers than the results themselves.

    • Andrew says:

      Ashok:

      It would be easy enough to compute these posterior probabilities. The key is to use a real prior. With a flat prior, even a completely non-stat-signif result that’s 1 se from zero will still give an 84% posterior probability that the effect is positive, while a routine 2 se result yields an implausibly optimistic 97.5% posterior probability. An appropriately zero-centered prior will pull these probabilities toward 50%.

      • Hi Andrew,

        can you give a more detailed specification of what you mean by “real prior”?

        • Ashok Rao says:

          Here’s AG from another post: “Here [when I say "real prior"] I’m not talking about a subjective prior that is meant to express a personal belief but rather a distribution that represents a summary of prior scientific knowledge. Such an expression can only be approximate (as, indeed, assumptions such as logistic regressions, additive treatment effects, and all the rest, are only approximations too), and I agree with Senn that it would be rash to let philosophical foundations be a justification for using Bayesian methods. Rather, my work on the philosophy of statistics is intended to demonstrate how Bayesian inference can fit into a falsificationist philosophy that I am comfortable with on general grounds.”

  2. [...] This one (from Brian Nosek, Jeffrey Spies, and Matt Motyl) is so great that all quantitative political scientists (and sociologists, and economists, and public health researchers, . . .) should read it too. Right now. [...]

  3. Anonymous says:

    I think the best part of the story is that Motyl could conceivably have a good publication out of it.

  4. Brian Nosek says:

    Thanks for the interest. Here is a link to the final, open-access version of the article: http://pps.sagepub.com/content/7/6/615.full. And, if you are interested in Utopia, part I, it is here (also OA): http://www.tandfonline.com/doi/abs/10.1080/1047840X.2012.692215#.UfU2rGTJG6c

    • Rahul says:

      Any insights on why the work was non-replicable? Why the large variation?

    • K? O'Rourke says:

      Neat!

      I was involved in something similar, but did not think about _spinning it_ into a methodology paper :-(
      Some regression analysis had found a significant increase in hospital mortality in the wee hours of the morning (which folks had been worried might be the case), but the senior clinician was not convinced and suggested we wait six months when we would get another year of data. We did and it went away. The junior clinician was furious pointing out he could have obtained two publications out of this (the false positive and then truer negative) and he never worked with us again.

      An interesting point about the Sterling 1959 paper they quote, the topic was suggested by RA Fisher.

      And I like their points “Transparency can improve our practices even if no one actually looks, simply because we know that someone could look” and “Openness is not needed because we are untrustworthy; it is needed because we are human”. I tried to get those ideas across here http://andrewgelman.com/2012/02/12/meta-analysis-game-theory-and-incentives-to-do-replicable-research/

  5. I like your title. But I think this story is what they call “statistician porn”.

  6. Russ Roberts had a conversation with Brian Nosek about this a while back. It’s worth re-listening to: http://www.econtalk.org/archives/2012/09/nosek_on_truth.html

  7. [...] Andrew Gelman links to this nice paper by Nosek, Spies and Motel, about an exciting “result” in psychological research: instead of rushing to publish, they scrupulously rushed to replicate, and the result disappeared. The fairy tale ending is that they got a nice publication from using this experience to tell us what we already know – that “significant” results obtained from small, ad hoc experimental samples are pretty much worthless. [...]

  8. +1 More stories like this one!

  9. [...] Andrew Gelman. See also Frederick Guy’s commentary. Posted in [...]

  10. [...] Disclaimer: This is an incredibly “crude” study of these phenomenon. Besides all of the possible research design flaws, I am also working with a rather small number of cases (which doesn’t even come close to 100!). Therefore, I do not expect all (or any) of my results to hold in a more rigorous analysis. For a great example of what I mean, check out Andrew Gelman’s post. [...]

  11. [...] a lot about reproducibility and researcher degrees of freedom lately. But Andrew Gelman passes on the best story yet about these issues. Two psychologists interested in political ideology did an online experiment in which people of [...]

  12. [...] Andrew Gelman tells an interesting story about a scientific discovery that lasted just as long as it took to try to replicate the results. [...]

  13. Ahmed Fasih says:

    Word-fail in “The disinterest in replication is striking given its centrality to science” is especially sad given the importance of the sentiment expressed. A “disinterested” party has no conflict of interest—what the writers meant is “lack of interest”.

    Everything in the above paragraph is true except the first word: the word-fail has been fixed in the final version of the paper (link provided in comments by Brian Nosek): “The lack of interest in replication is striking given its centrality to science”. Good work!

  14. [...] 50 shades of gray: A research story Survey of Earliest Human Settlements Undermines Claim That War Has Deep Evolutionary Roots Goodbye to All That (Water): Field notes from a drying west. Aphid attacks should be reported through the fungusphone New analyses undermine perception of DNA infallibility [...]

  15. [...] On reproducibility and incentives [...]

  16. […] How not to get published. (Andrew Gelman/Brian Nosek, Jeffrey Spies, and Matt […]