David Lockhart writes:
I found these two papers – in of all places the presentation which Emil Kirkegaard and John Fuerst are presenting in London this weekend, which they claim is preventing them from responding to the can of worms they have opened by publishing a large, non-anonymized database of OKCupid dating profiles. This seems like it may become an important case in research ethics and data privacy. You may want to look into it. I recommend starting with this post by Oliver Keyes, but Vox, Vice and Thomas Lumley have all picked up the story.
At any rate, the culprits cite these two papers that look quite good, and the second is the lead in to a whole special issue on cumulative science from Psychological Methods in 2009.
Frank Schmidt (1996) “Statistical significance testing and cumulative knowledge in psychology: Implications for traning of researchers”
(Ha! Hosted by Mayo – I hope she’s already pointed you to it)
Patrick Curran (2009) “The seemingly quixotic pursuit of a cumulative psychological science”
In addition, I’d point you to Allen Newell’s work in his paper “You can’t play 20 questions against nature and win” from 1973 and his book _Unified Theories of Cognition_ from 1990 where he revisits and reiterates the idea. Here’s a link to the 1973 paper:
The 20 questions paper is a summing up from a colloquium and heavily references the papers presented with little re-presentation of them, but I think the larger point stands on its own. Also, you may be familiar with the Chase & Simon paper being discussed, as it studied chunking and other phenomena in the memory of chess experts, both taking deGroot’s work further and showing that using larger chunks of related pieces accounts for a lot of the memory advantage that expert players have for board positions.
A comment on the enterprise of surveying this intellectual history. We can find papers like Schmidt and Curran that approvingly cite Meehl and note that the criticisms still stand. We can find others like Newell making similar points but not citing Meehl (or being cited by Schmidt or Curran). We might be able to find papers pushing back against Meehl and also look at how much they get cited. But it seems likely that we can’t find the many psychologists who are unaware of Meehl’s criticism or do not think it is relevant to their own work and so don’t bother commenting on it. And I suspect that’s where the real problem lies.
One conjecture I have is that the root of the problem is that exploratory analysis is something people spend more time working on, thinking about and reading about than confirmatory data analysis, for example:
I actually think the graph in that last link is misleading, as it is my impression that the expression “confirmatory data analysis” is a backformation that exists only in reference to “exploratory data analysis.” So it makes sense that people mostly won’t be talking about confirmatory data analysis, even when they’re doing it. They’ll just talk about hypothesis testing or whatever. For more on this, see my 2010 discussion of exploratory and confirmatory data analysis. And here’s my recent discussion of Meehl.
This combines with a change in the standards of what constitutes a publishable unit to allow publication of what were previously regarded as partial, preliminary results. (I’ve seen this assertion made but don’t know where, and finding a source has been hard.) It’s not necessarily a bad thing to allow some scientists to specialize in EDA or to allow them to publish an EDA of potential importance and decide they would rather pursue something else themselves. But it becomes a problem when the rewards – both extrinsic and intrinsic – are tilted so far towards the EDA, and doing just the EDA becomes so common that everyone seems to forget that the pre-registered confirmation is a crucial piece of the knowledge discovery process, not just something you ignore until you get all the way to evaluating real world interventions.
You might also check out this article in which Angela Duckworth complains that her work on “grit” as a predictor of success is being unfairly stuck with a failure to replicate label for supposed replications from folks that don’t seem to understand her theory:
And finally if you aren’t aware of it, check out Norm Breslow’s 2003 paper _Are statistical contributions to medicine undervalued?_
Amongst other things, on page 4 Breslow discusses the variety of findings submitted by different teams on a class project to analyse an open question with real epidemiologic data. (I was one of the students in that class and am working on writing up something about that experience.) I know that Nosek has since done something similar, probably with more senior researchers.
I don’t really have anything to say on this, but I thought all the above links might interest some of you.