Preregistration of Studies and Mock Reports

The traditional system of scientific and scholarly publishing is breaking down in two different directions.

On one hand, we are moving away from relying on a small set of journals as gatekeepers: the number of papers and research projects is increasing, the number of publication outlets is increasing, and important manuscripts are being posted on SSRN, Arxiv, and other nonrefereed sites.

At the same time, many researchers are worried about the profusion of published claims that turn out to not replicate or in plain language, to be false. This concern is not new–some prominent discussions include Rosenthal (1979), Ioannidis (2005), and Vul et al. (2009)–but there is a growing sense that the scientific signal is being swamped by noise.

I recently had the opportunity to comment in the journal Political Analysis on two papers, one by Humphreys, Sierra, and Windt, and one by Monogan, on the preregistration of studies and mock reports. Here’s the issue of the journal.

Given the high cost of collecting data compared with the relatively low cost of writing a mock report, I recommend the “mock report” strategy be done more often, especially for researchers planning a new and expensive study. The mock report is a form of pilot study and has similar virtues.

In the long term, I believe we as social scientists need to move beyond the paradigm in which a single study can establish a definitive result. In addition to the procedural innovations suggested in the papers at hand, I think we have to more seriously consider the integration of new studies with the existing literature, going beyond the simple (and wrong) dichotomy in which statistically significant findings are considered as true and nonsignificant results are taken to be zero. But registration of studies seems like a useful step in any case.

14 thoughts on “Preregistration of Studies and Mock Reports

  1. A fascinating exchange! We proposed and implemented something similar (http://pps.sagepub.com/content/7/6/632.full) in a recent (open access) special issue on replicability in psychological science. The neuroimaging journal Cortex has also recently launched a novel article format, ‘registered reports’, along these lines (http://neurochambers.blogspot.co.uk/2012/10/changing-culture-of-scientific.html?m=1), so it seems such initiatives are taking flight.

  2. 1 Suppose registering a protocol reduces your chances of finding significance (or overtly exposes your ex post analyses as desperate)
    2 Suppose getting published depends on significance
    3 Suppose you plan to go into the job market, and having publications is helpful.

    Don’t register, fish.

    • Anon:

      What a horribly cynical attitude. But I do agree with you on point 1, that pre-specified analysis are just a start and should not be used to restrict what is done once the data come. I actually make that point in my published discussion. As to point 2 (and, implicitly, point 3), I think it’s a mistake for publication to depend on statistical significance, for various reasons we’ve discussed many times on this blog (most recently in the context of the notorious “genetic diversity” study published in the American Economic Review).

      • Agree is a horrible cynical attitude but assume (1) is true;(2) is a bad idea but also a stylized fact, just do a literature search; (3) is likely. There’s morality, and professional norms, so the conclusion may not follow but your reaction implies a duality. When political scientists analyze politicians they assign them all kinds of venal motivations. But when looking closer to home one cannot even raise a hypothesis without being labelled a cynic. (I can see an argument about self-selection, and how that makes the populations of pol. sci and politicians differ, but also an alternative that human nature is human nature.)

  3. My $0.02: Specifying in advance what models you’re going to run gives me a huge boost of confidence in your work — and it probably also sharpens experimental designs before they are implemented in the first place (using block randomization if you’re really interested in that treatment*covariate interaction, for example). But the idea that you can’t “fish” for new insights once the data come in is silly: just draw a line under Part I: What We Thought We’d Find, and start Part II: Things We Didn’t Expect, But Are Interesting Hypotheses To Test Next Time!

  4. What about requiring authors to commit to running the published model on new data when it becomes available? Not only might we get two pulications from each effort but it would expose us to some really out of sample checks. I have a paper published in 1998 that used time series cross-section data through 1996, and a paper published in 2008 that used time series cross-section data through 2006. I suspect that their results have held up quite well, but I don’t really know that for sure, and have zero incentive to go back and grab the new data and see how well the model fits it. No one is going to publish a “it still fits 6 (or 16) years later” paper even if I found out the model still works, and if the model doesn’t work my incentive to put that out there is just about nil. But if every journal had a little “does it still fit” section at the end …

    Which reminds me of one of my favorite stories from when I was at GAO. I was asked to look at published research on a subject (subject and author to remain anonymous in order to protect the guilty), contacted the author of the only peer reviewed paper on the subject, and found that, 6 years later, he didn’t have the data. Tried to replicate the results, and couldn’t. Went to the published stats and found a key independent variable was the subject of frequent revision, so the inability to replicate may have been caused by not knowing if he was using the 1997 published value for 1996, or the 1999 published value for 1996. Author was no help in resolving this. On further investigation, we found that the agency that published the statistics (and made the revisions) collected actual data sporadically and rarely, and usually “attributed” the value of this independent variable with a procudure that relied primarily on, you guessed it, the author’s dependent variable.

  5. Pingback: Preregistration of Studies and Mock Reports | Musings on Using and Misusing Statistics

  6. Andrew, we just read this symposium at our weekly methods meeting here at Rice, and the main thing I was struck by was that the pro-registration folks seem to be operating on the premise that one can learn a lot from a sufficiently pre-structured empirical study. But I don’t think that’s right: even under ideal conditions, hypothesis tests in isolation don’t update our priors much:

    http://politicalmethodology.wordpress.com/2013/01/18/how-much-can-we-learn-from-an-empirical-result-a-bayesian-approach-to-power-analysis-and-the-implications-for-pre-registration/

    …and so the price we pay in lost creativity and unanticipated discoveries seems too high. What do you think?

    • Justin:

      1. Registration does not rule out exploration: it simply distinguishes pre-planned analyses from not pre-planned.

      2. As with most things in life using pre-registration involves a tradeoff. Yet I’m willing to bet you that for every pound of “lost creativity and unanticipated discoveries” we loose 10 pounds of fishy results. And the full general equilibrium implications are even more stark: 10 pounds of fishy results can set off the whole literature in wrong directions, waste creativity, and delay useful discoveries. (e.g. read Gary Taubes on “Why we get fat?” and you’ll see what I mean.)

      • 1. In principle, maybe. But how long will it be before we hear reviewers telling us that “as an unregistered analysis, we cannot distinguish these results from mere chance?”

        2. The first bit, the degree of trade-off involved, involves an empirical question. I’m not aware of a study that would or even could assess the innovation/failure tradeoff in any field. But I think the “general equilibrium” implications that you imply only hold if, contra the argument I made before, each study is individually taken seriously as a demonstrative result and is allowed to shape the course of research in an area without proper replication (and I don’t mean repeating the authors’ analysis of the authors’ data set).

        • 1. Maybe you are right, but I prefer that as a default than: “make your data confess or else don’t get published”.

          2. Indeed, its an empirical question. Here is a test: Gather data on Randomized Controlled Trials with and w/o pre-registration. Considering only the planned analyses for the pre-registered RCTs, code the number of significant findings against the number of test reported. Regress on a pre-registration dummy. Is it statistically and substantively significant? You could also do a within comparison in pre-registered studies (e.g. how many pre-planned are *** vs how many not planned are ***)

Comments are closed.