There’s nothing wrong with Meehl. He’s great. The puzzle of Paul Meehl is that everything we’re saying now, all this stuff about the problems with Psychological Science and PPNAS and Ted talks and all that, Paul Meehl was saying 50 years ago. And it was no secret. So how is it that all this was happening, in plain sight, and now here we are?
An intellectual history is needed.
I’ll start with this quote, provided by a commenter on our recent thread on the “power pose,” a business fad hyped by NPR, NYT, Ted, etc., based on some “p less than .05” results that were published in a psychology journal. Part of our discussion turned on the thrashing attempts of power-pose defenders to salvage their hypothesis in the context of a failed replication, by postulating an effect that work under some conditions but not others, an effect that shines brightly when studied by motivated researchers but slips away in replications.
And now here’s Meehl, from 1967:
It is not unusual that (e) this ad hoc challenging of auxiliary hypotheses is repeated in the course of a series of related experiments, in which the auxiliary hypothesis involved in Experiment 1 (and challenged ad hoc in order to avoid the latter’s modus tollens impact on the theory) becomes the focus of interest in Experiment 2, which in turn utilizes further plausible but easily challenged auxiliary hypotheses, and so forth. In this fashion a zealous and clever investigator can slowly wend his way through a tenuous nomological network, performing a long series of related experiments which appear to the uncritical reader as a fine example of “an integrated research program,” without ever once refuting or corroborating so much as a single strand of the network. Some of the more horrible examples of this process would require the combined analytic and reconstructive efforts of Carnap, Hempel, and Popper to unscramble the logical relationships of theories and hypotheses to evidence. Meanwhile our eager-beaver researcher, undismayed by logic-of-science considerations and relying blissfully on the “exactitude” of modem statistical hypothesis-testing, has produced a long publication list and been promoted to a full professorship. In terms of his contribution to the enduring body of psychological knowledge, he has done hardly anything. His true position is that of a potent-but-sterile intellectual rake, who leaves in his merry path a long train of ravished maidens but no viable scientific offspring.
Exactly! Meehl got it all down in 1967. And Meehl was respected, people knew about him. Meehl wasn’t in that classic 1982 book edited by Kahneman, Slovic, and Tversky, but he could’ve been.
But somehow, even though Meehl was saying this over and over again, we weren’t listening. We (that is, the fields of statistics and psychometrics) were working on the edges, worrying about relatively trivial issues such as the “file drawer effect” (1979 paper by Rosenthal cited 3500 times) and missing the big picture, the problem discussed by Meehl, that researchers are working within a system that can perpetuate null results.
It’s a little bit like the vacuum energy in quantum physics. Remember that? The idea that the null state is not zero, that even in a vacuum there is energy, there are particles appearing and disappearing? It’s kinda like that in statistical studies: there’s variation, there’s noise, and if you shake it up you will be able to find statistical significance. Meehl has a sociological model of how the vacuum energy and the statistical significance operator can sustain a theory indefinitely even when true effects are zero.
But nobody was listening. Or were listening but in one ear and out the other. Whatever. It took us nearly half a century to realize the importance of p-hacking and the garden of forking paths, to realize that these are not just ways to shoot down joke pseudo-research such as Bem’s ESP study (published in JPSP in 2011) and the notorious Bible Code paper (published in Statistical Scence—how embarrassing!—in 1994), but that they are a key part of how the scientific process works. P-hacking and the garden of forking paths grease the wheels of normal science in psychology and medicine. Without these mechanisms (which extract statistical significance from the vacuum energy), the whole system would dry up, we’d have to start publishing everything where p is less than .25 or something.
So . . . whassup? What happened? Why did it take us nearly 50 years to what Meehl was saying all along? This is what I want the intellectual history to help me understand.
P.S. Josh Miller points to these video lectures from Meehl’s Philosophical Psychology class.