Replication backlash

Raghuveer Parthasarathy pointed me to an article in Nature by Mina Bissell, who writes, “The push to replicate findings could shelve promising research and unfairly damage the reputations of careful, meticulous scientists.”

I can see where she’s coming from: if you work hard day after day in the lab, it’s gotta be a bit frustrating to find all your work questioned, for the frauds of the Dr. Anil Pottis and Diederik Stapels to be treated as a reason for everyone else’s work to be considered guilty until proven innocent.

That said, I pretty much disagree with Bissell’s article, and really the best thing I can say about it is that I think it’s a good sign that the push for replication is so strong that now there’s a backlash against it. Traditionally, leading scientists have been able to simply ignore the push for replication. If they are feeling that the replication movement is strong enough that they need to fight it, that to me is good news.

I’ll explain a bit in the context of Bissell’s article. She writes:

Articles in both the scientific and popular press have addressed how frequently biologists are unable to repeat each other’s experiments, even when using the same materials and methods. But I am concerned about the latest drive by some in biology to have results replicated by an independent, self-appointed entity that will charge for the service. The US National Institutes of Health is considering making validation routine for certain types of experiments, including the basic science that leads to clinical trials.

But, as she points out, such replications will be costly. As she puts it:

Isn’t reproducibility the bedrock of the scientific process? Yes, up to a point. But it is sometimes much easier not to replicate than to replicate studies, because the techniques and reagents are sophisticated, time-consuming and difficult to master. In the past ten years, every paper published on which I have been senior author has taken between four and six years to complete, and at times much longer. People in my lab often need months — if not a year — to replicate some of the experiments we have done . . .

So, yes, if we require everything to be replicate, it will reduce the resources that are available to do new research.

Replication is always a concern when dealing with systems as complex as the three-dimensional cell cultures routinely used in my lab. But with time and careful consideration of experimental conditions, they [Bissell’s students and postdocs], and others, have always managed to replicate our previous data.

If all science were like Bissell’s, I guess we’d be in great shape. In fact, given her track record, perhaps we could some sort of lifetime seal of approval to the work in her lab, and agree in the future to trust all her data without need for replication.

The problem is that there appear to be labs without 100% successful replication rates. Not just fraud (although, yes, that does exist); and not just people cutting corners, for example, improperly excluding cases in a clinical trial (although, yes, that does exist); and not just selection bias and measurement error (although, yes, these do exist too); but just the usual story of results that don’t hold up under replication, perhaps because the published results just happened to stand out in an initial dataset (as Vul et al. pointed out in the context of imaging studies in neuroscience) or because certain effects are variable and appear in some settings and not in others. Lots of reasons. In any case, replications do fail, even with time and careful consideration of experimental conditions. In that sense, Bissell indeed has to pay for the sins of others, but I think that’s inevitable: in any system that is less than 100% perfect, some effort ends up being spent on checking things that, retrospectively, turned out to be ok.

Later on, Bissell writes:

The right thing to do as a replicator of someone else’s findings is to consult the original authors thoughtfully. If e-mails and phone calls don’t solve the problems in replication, ask either to go to the original lab to reproduce the data together, or invite someone from their lab to come to yours. Of course replicators must pay for all this, but it is a small price in relation to the time one will save, or the suffering one might otherwise cause by declaring a finding irreproducible.

Hmmmm . . . maybe . . . but maybe a simpler approach would be for the authors of the article to describe clearly (with videos, for example, if that is necessary to demonstrate details of lab procedure) in the public record.

After all, a central purpose of scientific publication is to communicate with other scientists. If your published material is not clear—if a paper can’t be replicated without emails, phone calls, and a lab visit—this seems like a problem to me! If outsiders can’t replicate the exact study you’ve reported, they could well have trouble using your results in future research. To put it another way, if certain findings are hard to get, requiring lots of lab technique that is nowhere published—and I accept that this is just the way things can be in modern biology—then these findings won’t necessarily apply in future work, and this seems like a serious concern.

To me, the solution is not to require e-mails, phone calls, and lab visits—which, really, would be needed not just for potential replicators but for anyone doing further research in the field—but rather to expand the idea of “publication” to go beyond the current standard telegraphic description of methods and results, and beyond the current standard supplementary material (which is not typically a set of information allowing you to replicate the study; rather, it’s extra analyses needed to placate the journal referees), to include a full description of methods and data, including videos and as much raw data as is possible (with some scrambling if human subjects is an issue). No limits—whatever it takes! This isn’t about replication or about pesky reporting requirements, it’s about science. If you publish a result, you should want others to be able to use it.

Of course, I think replicators should act in good faith. If certain aspects of a study are standard practice and have been published elsewhere, maybe they don’t need to be described in detail in the paper or the supplementary material; a reference to the literature could be enough. Indeed, to the extent that full descriptions of research methods are required, this will make life easier for people to describe their setups in future papers.

Bissell points out that describing research methods isn’t always easy:

Twenty years ago . . . Biologists were using relatively simple tools and materials, such as pre-made media and embryonic fibroblasts from chickens and mice. The techniques available were inexpensive and easy to learn, thus most experiments would have been fairly easy to double-check. But today, biologists use large data sets, engineered animals and complex culture models . . . Many scientists use epithelial cell lines that are exquisitely sensitive. The slightest shift in their microenvironment can alter the results — something a newcomer might not spot. It is common for even a seasoned scientist to struggle with cell lines and culture conditions, and unknowingly introduce changes that will make it seem that a study cannot be reproduced. . . .

If the microenvironment is important, record as much of it as you can for the publication! Again, if it really takes a year for a study to be reproduced, if your finding is that fragile, this is something that researchers should know about right away from reading the article.

Bissell gives an example of “a non-malignant human breast cell line that is now used by many for three-dimensional experiments”:

A collaborator noticed that her group could not reproduce its own data convincingly when using cells from a cell bank. She had obtained the original cells from another investigator. And they had been cultured under conditions in which they had drifted. Rather than despairing, the group analysed the reasons behind the differences and identified crucial changes in cell-cycle regulation in the drifted cells. This finding led to an exciting, new interpretation of the data that were subsequently published.

That’s great! And that’s why it’s good to publish all the information necessary so that a study can be replicated. That way, this sort of exciting research could be done all the time

Costs and benefits

The other issue that Bissell is (implicitly) raising is a cost-benefit calculation. When she writes of the suffering caused by declaring a finding irreproducible, I assume that ultimately she’s talking about a patient who will get sick or even die because some potential treatment never gets developed or never becomes available because some promising bit of research got dinged. On the other hand, when research that is published in a top journal but does not hold up, this can waste thousands of hours of researchers’ time, spending resources that otherwise could have been used on productive research.

Indeed, even when we talk about reporting requirements, we are really talking about tradeoffs. Clearly writing up one’s experimental protocol (and maybe including a Youtube) and setting up data in archival form, that takes work, it represents time and effort that could otherwise be spent on research (or evan on internal replication). On the other hand, when methods and data are not clearly set out in the public record, this can result in wasted effort by lots of other labs, following false leads as they try to figure out exactly how the experiment was done.

I can’t be sure, but my guess is that, for important, high-profile research, on balance it’s a benefit to put all the details in the public record. Sure, that takes some effort by the originating lab, but it might save lots more effort for each of dozens of other labs that are trying to move forward from the published finding.

Here’s an example. Bissell writes:

When researchers at Amgen, a pharmaceutical company in Thousand Oaks, California, failed to replicate many important studies in preclinical cancer research, they tried to contact the authors and exchange materials. They could confirm only 11% of the papers. I think that if more biotech companies had the patience to send someone to the original labs, perhaps the percentage of reproducibility would be much higher.

I worry about this. If people can’t replicate a published result, what are we supposed to make of it? If the result is so fragile that it only works under some conditions that have never been written down, what is the scientific community supposed to do with it?

And there’s this:

It is true that, in some cases, no matter how meticulous one is, some papers do not hold up. But if the steps above are taken and the research still cannot be reproduced, then these non-valid findings will eventually be weeded out naturally when other careful scientists repeatedly fail to reproduce them. But sooner or later, the paper should be withdrawn from the literature by its authors.

Yeah, right. Tell it to Daryl Bem.

What happened?

I think that where Bissell went wrong is by thinking of replication in a defensive way, and thinking of the result being to “damage the reputations of careful, meticulous scientists.” Instead, I recommend she take a forward-looking view, and think of replicability as a way of moving science forward faster. If other researchers can’t replicate what you did, they might well have problems extending your results. The easier you make it for them to replicate, indeed the more replications that people have done of your work, the more they will be able, and motivated, to carry on the torch.

Nothing magic about publication

Bissell seems to be saying that if a biology paper is published, it should be treated as correct, even if outsiders can’t replicate it, all the way until the non-replicators “consult the original authors thoughtfully,” send emails and phone calls, and “either to go to the original lab to reproduce the data together, or invite someone from their lab to come to yours.” After all of this, if the results still don’t hold up, they can be “weeded out naturally from the literature”—but, even then, only after other scientists “repeatedly fail to reproduce them.”

This seems pretty clear: you need multiple failed replications, each involving thoughtful conversation, email, phone, and a physical lab visit. Until then, you treat the published claim as true.

OK, fine. Suppose we accept this principle. How, then, do we treat an unpublished paper? Suppose someone with a Ph.D. in biology posts a paper on Arxiv (or whatever is the biology equivalent), and it can’t be replicated? Is it ok to question the original paper, to treat it as only provisional, to label it as unreplicated? That’s ok, right? I mean, you can’t just post something on the web and automatically get the benefit of the doubt that you didn’t make any mistakes. Ph.D.’s make errors all the time (just like everyone else).

Now we can engage in some salami slicing. According to Bissell (as I interpret here), if you publish an article in Cell or some top journal like that, you get the benefit of the doubt and your claims get treated as correct until there are multiple costly, failed replications. But if you post a paper on your website, all you’ve done is make a claim. Now suppose you publish in a middling journal, say, the Journal of Theoretical Biology. Does that give you the benefit of the doubt? What about Nature Neuroscience? PNAS? Plos-One? I think you get my point. A publication in Cell is nothing more than an Arxiv paper that happened to hit the right referees at the right time. Sure, approval by 3 referees or 6 referees or whatever is something, but all they did is read some words and look at some pictures.

It’s a strange view of science in which a few referee reports is enough to put something into a default-believe-it mode, but a failed replication doesn’t count for anything. Bissell is criticizing replicators for not having long talks and visits with the original researchers, but the referees don’t do any emails, phone calls, or lab visits at all! If their judgments, based simply on reading the article, carry weight, then it seems odd to me to discount failed replications that are also based on the published record.

My view that we should focus on the published record (including references, as appropriate) is not legalistic or nitpicking. I’m not trying to say: Hey, you didn’t include that in the paper, gotcha! I’m just saying that, if somebody reads your paper and can’t figure out what you did, and can only do that through lengthy emails, phone conversations, and lab visits, then this is going to limit the contribution your paper can make.

As C. Glenn Begley wrote in a comment:

A result that is not sufficiently robust that it can be independently reproduced will not provide the basis for an effective therapy in an outbred human population. A result that is not able to be independently reproduced, that cannot be translated to another lab using what most would regard as standard laboratory procedures (blinding, controls, validated reagents etc) is not a result. It is simply a ‘scientific allegation’.

To which I would add: Everyone would agree that the above paragraph applies to an unpublished article. I’m with Begley that it also applies to published articles, even those published in top journals.

A solution that should make everyone happy

Or, to put it another way, maybe Bissell is right that if someone can’t replicate your paper, it’s no big deal. But it’s information I’d like to have. So maybe we can all be happy: all failed replications can be listed on the website of the original paper (then grumps and skeptics like me will be satisfied), but Bissell and others can continue to believe published results on the grounds that the replications weren’t careful enough. And, yes, published replications should be held to the same high standard. If you fail to replicate a result and you want your failed replication to be published, it should contain full details of your lab setup, with videos as necessary.