Having trouble planning a replication? Here’s how the scientific publishing process gets in the way.

So, I decided to do a preregistered replication. Of one of my own projects. We made a four-step plan: (1) do a duplication, digging up our old code and our old data and checking that we could reproduce our published graphs; (2) clean our analysis in various ways and check that our results don’t change much; (3) fit our model on other data and compare to our earlier results; and (4) re-fit our model on an entirely new, pristine dataset and see what we get.

Each of these steps requires some effort in cleaning data, cleaning code, and understanding the results. And, for the preregistered replication, we also need to be clear on what we’ll be doing with the data and how we plan to interpret what comes out.

So far we’ve done steps 1-3 and prepared the data for step 4. We want to preregister this final replication before doing it. Our first plan was to time-stamp the preregistration plan and publish it on Arxiv. We’d write this up as a paper, also giving the details on steps 1-3. But then we thought it would make sense to publish the whole thing in a journal. Replications should be published, right? So we contacted the journal where our earlier paper was published. The editor responded that replication would be a great idea but unfortunately the journal was not set up to publish such a paper. So we submitted our paper to a methods journal. It’s now in the reviewing process.

So here’s the funny thing. We can’t do the replication yet! Why? Two reasons. First, I think the publication decision should be based on the replication plan, not on the results. So we can’t do the new analysis until the paper has been accepted for publication, otherwise we’re violating this separation principle. Second, the reviewers for the journal might well ask for changes. What if they ask for changes in our preregistration plan? If the plan has already been time-stamped and we’ve already done the replication, it’s too late.

So our preregistration plan is just sitting there, waiting on the journal review process.

P.S. Since I wrote the above post, the paper got rejected by the journal. Fair enough: they could have more important papers they’d like to publish, and I have no complaints with their reviewing. Their journal, their call. We made some changes and submitted the paper to a different journal and we’re waiting to hear from them. Still haven’t done the replication, as, again, we’re waiting to finalize our paper and time stamp the final version before doing so.

Again, I have no complaints with the process, it’s just a funny story.

P.P.S. There’s some discussion in the comments about why publish in journals at all, so let me point you to two old posts on the topic here and here.

35 thoughts on “Having trouble planning a replication? Here’s how the scientific publishing process gets in the way.

  1. .The editor responded that replication would be a great idea but unfortunately the journal was not set up to publish such a paper.

    I’m curious what this means—any idea?

      • I guess they could argue that “replications are outside the scope of the journal’s mission” or something similar. However, (if you believe replication is a crucial aspect of the scientific method) wouldn’t that amount to an admission it is not meant to be a scientific journal? That could be fine, there are other ways of “knowing/learning” beyond science that may merit a journal.

        Thinking about it more, it comes down to whether replication is considered crucial to science or just “a great idea”. The answer is self-evident to me… but this is getting into philosophy so I am sure there can be much disagreement. The best thing is for journals to address this in a mission statement so that everyone is aware of their position. Then researchers who do not consider replication crucial can publish in journals sharing their philosophy, while those that do consider it crucial can publish elsewhere. Everyone is happy.

        • What about a short (maybe paragraph length) note saying such and such paper previously published in this Journal has been replicated by xxx. / failed to replicate?

        • Isn’t the point to see how the two results differ in order to figure out why, which requires a detailed presentation?

          Also, from the psychology replication project we learned researchers like to fiddle with the design and still call it a replication. So essentially all the same information found in the original paper needs to be present for it to be interpretable.

        • You could let the reviewers & editors see the details in their full glory & decide if it was a replication or a failure to replicate or just a crappy study.

          As a reader it’s rather more important to me to know whether a past result published was replicable or not & a gist of why. The details, not so much, so long as I trust the people doing the vetting.

          OTOH, if you think the reviewers are not competent to do a good job in the first place, well why publish in a Journal. There’s always Arxiv etc.

        • I renounce journal publication entirely. Of course, I don’t have a career where promotion etc is entirely built around counting how many times you can jump the journal hurdle. That being said, I’ve published stuff in journals, but if you offer me a comprehensive open alternative, I’ll move in a heartbeat.

        • Daniel:

          I know what you’re saying, and I have no more promotions coming in this life, and I have a publishing platform right here that is read more widely than any statistics journal . . . but still I publish in journals, sometimes when I’m asked to, sometimes for the benefit of colleagues, but often just because it seems like the right way to reach some important group of readers. This is not to defend journals, just to explain where I’m coming from.

          Also see here and here.

        • In most cases this means reaching readers behind a paywall. The institutionalized population as it were….

        • I don’t think so. Arxiv is still curated to some extent (at least by topic), and it is in specific fields. So you don’t publish say Econ in Arxiv (or you didn’t last I looked). Engineering is kind of marginal, etc. If you only work deep within one of the well supported fields of Arxiv then yeah, it works probably fine. But, what if you’re trying to publish something about the engineering, economic, and ecological tradeoffs of different water resources management strategies for example? Yeah, not an obvious Arxiv topic.

    • Anon, Bob, Rahul:

      I agree, but one thing I’m also trying to do here is communicate the importance of this topic to the people who publish in journals, so I think it’s worth putting in some effort here.

      Similar to how with the age-adjustment-death-rate thing, I published analyses on the blog but I also published something in PPNAS.

      • I never find anything through a journal. I used to back in the 80s and 90s when I got print copies of journals mailed to me or they showed up on tables. Everything I read now is either referred to me in person, on a blog, or in a citation to another paper I’m reading. Or sometimes because I look it up on Google or Google Scholar. Other than the last two options, that’s been true since grad school, when we shared photocopies of unpublished papers and tech reports where the real cutting-edge work was going on.

  2. “First, I think the publication decision should be based on the replication plan, not on the results.”

    Reviewers could adhere to this principle even if you submit the plan to a journal with results (and I get the sense that more and more reviewers are amenable to doing so).

  3. It seems to me that you’ve got two competing ideas here:

    1) Doing science

    2) Getting “academic credit”

    If you think that you’ve got a well designed analysis plan, then the “doing science” just requires that you let people know what you’re planning to do, and then do it.

    If you’re digging for “academic credit” then that requires that you convince some gatekeepers that you’re sexy enough and has relatively little to do with “doing science”.

    So, my advice is put the data and the analysis plan here on the blog with a time-stamp, and then carry out the analysis, and put it here on the blog…. Voila, you’re “doing science” !

    • +1.

      For bonus points, get a cryptographic timestamp on the plan. Heck, get more than one. I actually think there’s a way to do this with Bitcoin now. So you could DO SCIENCE WITH BITCOIN! This would improve coverage of your paper and potentially start a Bitcoin replication bubble. Because everyone knows that anything is better with Bitcoin.

    • I agree with the idea that it would be useful to set up a kind of decentralized scientific publication-web, along the lines of gnutella or one of those peer-to-peer filesharing protocols, where you share not only the crypto-hash but also the contents, and it gets distributed to others and becomes difficult to muck with, and easy to access.

      It’d be useful to have a PGP/GPG web of trust associated with the whole thing.

      the bitcoin thing is kinda hackish, and no good for actually disseminating the data.

      • A user-interface/GUI would let you do things like browse the database of researchers, select people to follow, configure the use of your own personal storage for content, as well as storage like S3 or google drive etc, and let you do things like bandwidth limit uploads/downloads.

        It’d probably make sense to incorporate citation information into this whole thing, by having GUIDs of other publications that you cite associated with your uploads…

        hey, wait, what was the name of that nonprofit that Stan is now associated with?

      • Do you really need the crypto-hash etc.? Doesn’t the good, central dissemination system (e.g. arxiv) solve 99% of the problem?

        Is it credible that an adversary will hack into arxiv & change the protocol in a pre-posted paper?

        • I think on the open internet, it’s credible that people will try to attack the system, if for no other reason than denial of service. I think the crypto hashes probably would help with the dissemination as well, allowing you to validate integrity of transmissions and keep the distributed database eventually consistent, so it serves more purposes than just preventing fraud.

        • another attack you’d need to prevent is the creation of fake IDs and accidental and intentional identity collisions. People will try to pose as other people, etc. In a non-centralized system, there’s no “google” to store your password and provide account recovery services, etc. Cryptography is inevitably required in an open internet I think. Look at the mess that is email spam due to a crypto-ID free system.

        • Take your email example: I read about an excellent solution a couple of years ago against the spam problem.

          Essentially a proof of work system that increases the burden on spammers to the point where spamming is not profitable. Thunderbird even had a plugin for it.

          Alternatively, take the whole PGP privacy plugins for TB. All great ideas and eminently workable. But I see no adoption.

          Ergo, I’ve concluded that users don’t really care. Nor for spam emails nor privacy. To the average user these are solutions in search of a problem.

        • But these are non-standard hacks to a broken protocol (SMTP/POP/IMAP). If you’re going to design a new protocol specifically for validated distribution of scientific publications, you should design it from the beginning with the modern internet including attackers in mind.

          If you look at Monotone or Git you see how crypto-hashes were adopted from the start, and work just fine because they’re built-in. No-one is demanding a git-without-the-crypto. Of course git and monotone both hard-coded sha1, you shouldn’t hard-code a particular algorithm.

  4. I’m lost on why pre-acceptance of 1-3 should change the plan for doing 4. Just do it. If you want to show that you had a pre-planned hypothesis test, publish that on the Net or Arxive. I think you are getting too cute.

    Really what you are trying to get the journals to do seems to be to agree to published planned experiments. This seems like a different concept than replication or not.

    • Nony:

      I’m not trying to be cute. I’m just trying to avoid having multiple versions of the paper floating around. Cleanest is to have the published preregistration be the actual preregistration. If I post the preregistration now, then get comments from the journal, then there will inevitably two versions of the paper: the officially preregistered version and the officially published version. I’d like to avoid that. And, yes, I could do it all on my webpage and avoid journals entirely, but I think that, even now, having something in a journal gets it taken more seriously.

Leave a Reply

Your email address will not be published. Required fields are marked *