Under the heading, “Results too good to be true,” Lee Sechrest points me to this discussion by “Neuroskeptic” of a discussion by psychology researcher Greg Francis of a published (and publicized) claim by biologists Brian Dias and Kerry Ressler that “Parental olfactory experience [in mice] influences behavior and neural structure in subsequent generations.” That’s a pretty big and surprising claim, and Dias and Ressler support it with some data: p=0.043, p=0.003, p=0.020, p=0.005, etc.
Francis’s key grounds for suspicion is that Dias and Ressler in their paper present 10 successful (statistically significant) results in a row, and, given the effect sizes they estimated, it would be unlikely to see such an unbroken string of successes.
Dias and Ressler replied that they did actually report negative results:
While we wish that all our behavioral, neuroanatomical, and epigenetic data were successful and statistically significant, one only need look at the Supporting Information in the article to see that data generated for all four figures in the Supporting Information did not yield significant results. We do not believe that nonsignificant data support our theoretical claims as suggested.
Francis followed up:
The non-significant effects reported by Dias & Ressler were not characterised by them as being “unsuccessful” but were either integrated into their theoretical ideas or were deemed irrelevant (some were controls that helped them make other arguments). Of course scientists have to change theories to match data, but if the data are noisy then this practice means the theory chases noise (and the findings show excess success relative to the theory).
I would also like to say that it’s probably not a good idea for Dias and Ressler to wish that all their data are “successful and statistically significant.” With small samples and small effects, this just isn’t gonna happen—indeed, it shouldn’t happen. Variation implies that not every small experiment will be statistically significant (or even in the desired direction), and I think it’s a mistake to define “success” in this way.
Do a large preregistered replication
In any case, the solution here seems pretty clear to me. Do a large preregistered replication. This is obvious but it’s not clear that it’s really being done. For example, in a news article from 2013, Virginia Hughes describes the research in question as “tantalizing” and that “other researchers seem convinced . . . neuroscientists, too, are enthusiastic about what these results might mean for understanding the brain,” and she talks about further research (“A good next step in resolving these pesky mechanistic questions would be to use chromatography to see whether odorant molecules like acetophenone actually get into the animals’ bloodstream . . . First, though, Dias and Ressler are working on another behavioral experiment. . . . Scientists, I have to assume, will be furiously working on what that something is for many decades to come . . .”) but I see no mention of any plan for a preregistered replication.
I’d like to see a clean, pure, large, preregistered replication such as Nosek, Spies, and Motyl did in their “50 shades of gray” paper. I recognize that this costs time, effort, and money. Still, replication in a biological study of mice seems so much easier than replication in political science or economics, and it would resolve a lot of statistical issues.