In the discussion following my talk yesterday, someone asked about preregistration and I gave an answer that I really liked, something I’d never thought of before.
I started with my usual story that preregistration is great in two settings: (a) replicating your own exploratory work (as in the 50 shades of gray paper), and (b) replicating the work of others (as was famously done by Ranehill et al.) but that replication isn’t so easy in econ or poli sci because we can’t just send the economy into recession or start a new war just to get more data points. So in my own career I’ve actually only once done a preregistered replication.
The one place that preregistration is really needed, I said, is if you want clean p-values. A p-value is very explicitly a statement about how you would’ve analyzed the data, had they come out differently. Sometimes when I’ve criticized published p-values on the grounds of forking paths, the original authors have fought back angrily, saying how unfair it is for me to first make an assumption about what they would’ve done under different conditions, and then make conclusions based on these assumptions. But they’re getting things backward: By stating a p-value at all, they’re the ones who are making a very strong assumption about their hypothetical behavior—an assumption that, in general, I have no reason to believe.
Preregistration is in fact the only way to ensure that p-values can be taken at their nominal values. In that way, preregistration is like random sampling which, strictly speaking, is the only way that sampling probabilities, estimates, standard errors, etc., can be taken at their nominal values; and like randomized treatment assignment which, strictly speaking, is the only way that the usual causal estimates are valid.
Yes, you can do surveys and get estimates and standard errors without ever taking a random sample—I do this all the time, despite not having the permission of Buggy Whip President Michael W. Link—but to do this we need to make assumptions.
And, yes, you can do causal inference from observational studies—indeed, in many settings this is absolutely necessary—but, again, assumptions are needed.
And, yes, you can compute p-values without preregistering your rules for data collection, data cleaning, data exclusion, data coding, and data analysis—people do it all the time—but such p-values are necessarily based on assumptions.
Just as a serious social science journal—or even Psychological Science or PPNAS—would never accept a paper on sampling without some discussion of the representativeness of the sample, and just as they would never accept a causal inference based on a simple regression with no identification strategy and no discussion of imbalance between treatment and control groups, so should they not take seriously a p-value without a careful assessment of the assumptions underlying it.
Or you can have random sampling, or you can have a randomized experiment, or you can have preregistration. These are methods of
Random sampling, random assignment, and preregistration can be a pain in the ass; they can be expensive; even when you do them there are often breaks such as nonresponse, non-compliance, or unexpected features in the data that require new, unanticipated decisions; and sometimes you can’t do them at all.
Random sampling, random assignment, and preregistration are not universal solutions, nor are they, in general, comprehensive solutions. What they are, is a way to make certain inferences more robust and less reliant on untestable assumptions. Whether it makes sense to go through with these design steps, depends on context.
So, yes, when someone tells me that preregistration is silly because it ties a researchers hands behind his or her back, I agree. Preregistration is costly. So is random sampling, so is randomized treatment assignment. These are costly and often not worth the trouble. And if you don’t want to do them, fine. But then show your assumptions. You make the call.
To my mind, the analogy between random sampling, random assignment, and preregistration is excellent. These are three parallel ideas, and to me it seems like just an accident of history that the the first two of these ideas are in every statistics textbook and are considered the default approach, wheres the third idea is only recently gaining popularity. Perhaps this has to do with analyses becoming open-ended. Perhaps fifty years ago there were fewer choices in data collection, processing, and analysis—fewer “researcher degrees of freedom”—so the implicit assumption underlying naively computed p-values was closer to actual practice.