Shea Levy writes:
You ended a post from last month [i.e., Feb.] with the injunction to not take the fact of a paper’s publication or citation status as meaning anything, and instead that we should “read each paper on its own.” Unfortunately, while I can usually follow e.g. the criticisms of a paper you might post, I’m not confident in my ability to independently assess arbitrary new papers I find. Assuming, say, a semester of a biological sciences-focused undergrad stats course and a general willingness and ability to pick up any additional stats theory or practice, what should someone in the relevant fields do to get to the point where they can meaningfully evaluate each paper they come across?
My reply: That’s a tough one. My own view of research papers has become much more skeptical over the years. For example, I devoted several posts to the Dennis-the-Dentist paper without expressing any skepticism at all—and then Uri Simonsohn comes along and shoots it down. So it’s hard to know what to say. I mean, even as of 2007, I think I had a pretty good understanding of statistics and social science. And look at all the savvy people who got sucked into that Bem ESP thing—not that they thought Bem had proved ESP, but many people didn’t realize how bad that paper was, just on statistical grounds.
So what to do to independently assess new papers?
I think you have to go Bayesian. And by that I don’t mean you should be assessing your prior probability that the null hypothesis is true. I mean that you have to think about effect sizes, on one side, and about measurement, on the other.
It’s not always easy. For example, I found the claimed effect sizes for the Dennis/Dentist paper to be reasonable (indeed, I posted specifically on the topic). For that paper, the problem was in the measurement, or one might say the likelihood: the mapping from underlying quantity of interest to data.
Other times we get external information, such as the failed replications in ovulation-and-clothing, or power pose, or embodied cognition. But we should be able to do better, as all these papers had major problems which were apparent, even before the failed reps.
One cue which we’ve discussed a lot: if a paper’s claim relies on p-values, and they have lots of forking paths, you might just have to set the whole paper aside.
Medical research: I’ve heard there’s lots of cheating, lots of excluding patients who are doing well under the control condition, lots of ways to get people out of the study, lots of playing around with endpoints.
The trouble is, this is all just a guide to skepticism. But I’m not skeptical about everything.
And the solution can’t be to ask Gelman. There’s only one of me to go around! (Or two, if you count my sister.) And I make mistakes too!
So I’m not sure. I’ll throw the question to the commentariat. What do you say?