Lee Sechrest writes:
Here is a remarkable paper, not well known, by Paul Meehl. My research group is about to undertake a fresh discussion of it, which we do about every five or ten years. The paper is now more than a quarter of a century old but it is, I think, dramatically pertinent to the “soft psychology” problems of today. If you have not read it, I think you will find it enlightening, and if you have read it, your blog readers might want to be referred to it at some time.
The paper is in a somewhat obscure journal with not much of a reputation as “peer reviewed.” (The journal’s practices should remind us that peer review is not a binary (yes-no) process. I reviewed a few paper for them, including two or three of Meehl’s. I asked Paul once why he published in such a journal. He replied that he was late in his career, and he did not have the time nor patience to deal with picky reviewers who were often poorly informed. He called my attention to the works of several other well-known, even eminent, psychologists who felt the same way and who published in the journal. So the obscurity of the publication should not deter us. The paper has been cited a few hundred times, but, alas, it has had little impact.
I agree. Whenever I read Meehl, I’m reminded of that famous passage from The Catcher in the Rye:
What really knocks me out is a book that, when you’re all done reading it, you wish the author that wrote it was a terrific friend of yours and you could call him up on the phone whenever you felt like it. That doesn’t happen much, though.
Meehl’s article is from 1985 and it begins:
Null hypothesis testing of correlational predictions from weak substantive theories in soft psychology is subject to the influence of ten obfuscating factors whose effects are usually (1) sizeable, (2) opposed, (3) variable, and (4) unknown. The net epistemic effect of these ten obfuscating influences is that the usual research literature review is well nigh uninterpretable. Major changes in graduate education, conduct of research, and editorial policy are proposed.
Meehl writes a lot about things that we’ve been rediscovering, and talking a lot about, recently. Including, for example, the distinction between scientific hypotheses and statistical hypotheses. I think that, as a good Popperian, Meehl would agree with me completely that null hypothesis significance testing wears the cloak of falsificationism without actually being falsificationist.
And it makes me wonder how it is that we (statistically-minded social scientists, or social-science-minded statisticians) have been ignoring these ideas for so many years.
Even if you were to step back only ten years, for example, you’d find me being a much more credulous consumer of quantitative research claims than I am now. I used to start with some basic level of belief and then have to struggle to find skeptical arguments. For me, I guess it started with the Kanazawa papers, but then I started to see a general pattern. But it’s taken awhile. Even as late as 2011, when that Bem paper came out, I at first subscribed to the general view that his ESP work was solid science and he just had the bad luck to be working in a field where the true effects were small. A couple years later, under the influence of E. J. Wagenmakers and others, it was in retrospect obvious that Bem’s paper was full of serious, serious problems, all in plain view for anyone to see.
And who of a certain age can forget that Statistical Science in 1994 published a paper purporting to demonstrate statistical evidence in favor of the so-called Bible Code? It took a couple of years for the message to get out, based on the careful efforts of computer scientist Brendan McKay and others, that the published analysis was wrong. In retrospect, though, it was a joke—if I (or, for that matter, a resurrection of Paul Meehl) were to see an analysis today that was comparable to that Bible Code paper, I think I’d see right away how ridiculous it is, just as I could right away see through the ovulation-and-voting paper and all the other “power = .06” studies we’ve been discussing here recently.
So here’s the puzzle. It’s been obvious to me for the past three or so years, obvious to E. J. Wagenmakers and Uri Simonsohn for a bit longer than that—but there was Paul Meehl, well-respected then and still well-remembered now, saying all this thirty and forty years ago, yet we forgot. (“We” = not just me, not just Daniel Kahneman and various editors of Psychological Science, but quantitative social scientists more generally.)
It’s not that quants haven’t been critical. We’ve been talking forever about correlation != causation, and selection bias, and specification searches. But these all seemed like little problems, things to warn people about. And, sure, there’s been a steady drumbeat (as the journalists say) of criticism of null hypothesis significance testing. But, but . . . the idea that the garden of forking paths and the statistical significance filter are central to the interpretation of statistical studies, that’s new to us (though not to Meehl).
I really don’t know what to say about our forgetfulness. I wish I could ask Meehl his opinion of what happened.
Maybe one reason we can feel more comfortable criticizing the classical approach is that now we have a serious alternative—falsificationist Bayes. As they say in politics, you can’t beat something with nothing. And now that we have a something (albeit in different flavors; E.J.’s falsificationist Bayes is not quite the same as mine), this might help us move foward.