During the past year or so, we’ve been discussing a bunch of “Psychological Science”-style papers in which dramatic claims are made based on somewhat open-ended analysis of small samples with noisy measurements. One thing that comes up in some of these discussions is that the people performing the studies say that they did not fish for statistical significance, and then outsiders such as myself argue that the data processing and analysis is contingent on the data, and this can occur even if the existing data were analyzed in only one way. This is the garden of forking paths, of which Eric Loken and I give several examples in our paper (and it’s easy enough to find lots more).
One interesting aspect of this discussion is that researchers with multiple comparisons issues will argue that the comparisons they performed on their data were in essence pre-chosen (even though not actually pre-registered) because they are derived from a pre-existing scientific theory. At the same time, they have to make the case that their research is new and exciting.
The result is what I call the “scientific surprise” two-step: (1) When defending the plausibility of your results, you emphasize that they are just as expected from a well-estabilished scientific theory with a rich literature; (2) When publicizing your results (including doing what it takes to get publication in a top journal, ideally a tabloid such as Science or Nature) you emphasize the novelty and surprise value.
It’s a bit like when we are asking for a NSF research grant. In the proposal, you pretty much need to argue that (a) we can definitely do the work, indeed you pretty much know what the steps will be already, and (b) this is path-breaking research, not just the cookbook working-out of existing ideas.
Don’t get me wrong: It’s possible that a result can be (1) predicted from an existing theory, and (2) newsworthy. For example, the theory itself might be on the fringe, and so it’s noteworthy that it’s confirmed. Or the result might be noteworthy as a confirmation of something that was already generally believed.
But if the result is genuinely a surprise, even to the researchers who did it, this should suggest that the finding is more exploratory than confirmatory.
A common narrative is that something inspires a hypothesis, researchers conduct a study to test that hypothesis, and then, more than merely finding a result that supports their hypothesis, the researchers were shocked by how big the effect turned out to be. . . .
If the only way somebody could have gotten a publishable result is by finding an effect about as big as what they found, doesn’t it seem fishy to claim to be shocked by it? After all, doing a study takes a lot of work. Why would anybody do it if the only reason they ended up with publishable results is that luckily the effect turned out to be much more massive than what they’d anticipated?
Wouldn’t you hope that if somebody goes to all the trouble of testing a hypothesis, they’d do a study that could be published as a positive finding so long as their results were merely consistent with what they were expecting?
Let’s consider this for fields, like lots of sociology, in which a quasi-necessary condition of publishing a quantitative finding as positive support for a hypothesis is being able to say that it’s statistically significant–most often at the .05 level. Now say a study is published for which the p-value is only a little less than .05. Here it is obviously dodgy for researchers to claim surprise. They went ahead and did their study, but had the estimated effect been much smaller than their “surprise” result, they wouldn’t have been able to publish it.
Then Freese flips it around:
Of course, a different problem is that researchers may begin an empirical project without any actual notion of what effect size they are imagining to be implied by their hypothesis. . . . I [Freese] have taken to calling such projects Columbian Inquiry. Like brave sailors, researchers simply just point their ships at the horizon with a vague hypothesis that there’s eventually land, and perhaps they’ll have the rations and luck to get there, or perhaps not. Of course, after a long time at sea with no land in sight, sailors start to get desperate, but there’s nothing they can do. Researchers, on the other hand, have a lot of more longitude—I mean, latitude—to terraform new land—I mean, publishable results—out of data that might to a less motivated researcher seem far more ambiguous in terms of how it speaks to the animating hypothesis.
I agree completely. It’s the garden of forking paths, in the ocean!