Skip to content

Difficulties in publishing non-replications of implausible findings

Eric Tassone points me to this news article by Christopher Shea on the challenges of debunking ESP. Shea writes:

Earlier this year, a major psychology journal published a paper suggesting that there was some evidence for “pre-cognition,” a form of ESP. Stuart Ritchie, a doctoral student at the University of Edinburgh, is part of a team that tried, but failed, to replicate those results. Here, he tells the Chronicle of Higher Education’s Tom Bartlett about the difficulties he’s had getting the results published.

Several journals told the team they wouldn’t publish a study that did no more than disprove a previous study. . . . An editor at another journal said he’d “only accept our paper if we ran a fourth experiment where we got a believer [in ESP] to run all the participants, to control for . . . experimenter effects.”

My reaction is, this isn’t as easy a question as it might seem. At first, one’s reaction might share Ritchie’s frustration that a shoddy paper by Bem got published while Ritchie’s careful replication got dinged. But, as I wrote when the issue came up on the sister blog:

Setting aside the whole “psychic powers” thing, it makes sense to me not to run the new experiment. After all, it’s hardly news that ESP doesn’t work. If “ESP doesn’t work” were publishable, you could fill up a journal many times over with such findings. And what would be the point of that? Better to start a new journal with some catchy title such as Replications of Well-Known Findings. In the physics division, you could have articles demonstrating that objects fall down, not up. In the chemistry division, you could publish demonstrations that H2 + O2 yields H2O plus energy. The biology section could have a paper demonstrating that cats and dogs can’t produce offspring. And so on.

So I don’t know the answer here. On one hand, we can hardly require or even expect that journals fill their pages with dog-bites-man nonreplications. (And, even in a computerized era where there are no page limits, there are still constraints on the time of editors and reviewers.) On the other hand, this leads to an asymmetry where crap gets on the front page and the refutation doesn’t even get published on page B16.


  1. idiot says:

    Is the idea of establishing a journal called “Replications of Well-Known Findings” a satiric suggestion? Because I actually think that might be a good solution: create a new journal which will accept replications of previous peer-reviewed results.

    Replications (and non-replications) is key in science, and if this ESP paper is important enough to get into a peer-reviewed journal, so should its refutation.

    • Erin Jonaitis says:

      This. Replication is especially key in psychology, because its effects are often tenuous — and psychology is a discipline where it basically doesn’t happen.

    • Andrew says:

      Yes, I’d be happy with the Replication journal. Ideally, publication in such a journal would count as “service” credit, just as, by analogy, I assume I get some credit for service as an academic statistician and political scientist by publishing in journals such as PS, American Statistician, or Chance. Or maybe an even better analogy is the implicit service credit I get for writing an R package.

      Regarding the ESP paper: maybe it’s important enough to get into a peer-reviewed journal, but I don’t think it’s important enough to get into a top peer-reviewed journal. That was what got it all the press. If it had been published in an appropriately obscure place, it would’ve been there for specialists to read but it wouldn’t have received all this attention.

      • There’s a new online journal for brief reports of replications in psychology:

        I’d rather see journals take responsibility for replications of their own studies (as in my comment below), but unless/until that happens, Psych File Drawer seems like a promising start.

      • K? O'Rourke says:

        Andrew: I believe you are bring out an important point that needs to be better emphasized – Universities have almost completely delegated the evaluation of their faculty to the journals and their editorial staff.

        Someone who has pulled off getting a ridiculous paper published in a top peer-reviewed journal may not deserved promotion and tenure, but that role can not be neglected in redressing that error by _validating_ those doing the non-replication studies as making a contribution to their discipline.

        Partly whether you believe it’s less important to be able to process evidence in published _results_ than to ensure mostly only those with the right stuff remain and prosper in academe.

        Maybe be not, and of course I believe it’s more important to process evidence in published _results_, but I do this by assuming they are almost all misleading.

        • Andrew says:


          My experience is that in an environment of trust, faculty can effectively discuss and evaluate a candidate’s work. But in an environment of distrust this doesn’t work, all that seems possible is to discuss external indicators such as publications, awards, grants, and letters.

          • K? O'Rourke says:

            Thanks, I was exaggerating with “completely delegated the evaluation” and “completely delegated the surrogate or last resort evaluation” but then thought a bit why some of my experiences where different.

            Perhaps it was because many or all the evaluators were not from the statistics discipline and had to rely more on surrogate evaluation.

    • C Ryan King says:

      I think that PLoS ONE, since it technically is supposed to publish only on technical merits, is a fine venue for such. (I don’t mean that negatively about P1, people I respect put decent papers in there. Their model just relies on search rather than reading the whole thing.)

  2. C Ryan King says:

    Personally, I would find it immensely useful if claims which don’t pan out had visible refutations in their citation tree rather than either a) disappearing b) appearing as only self-citations. Nature and Science do technical comments for such. In fact, Tibshirani has a comment submitted to Science regarding that adaptive histogram based Mutual-Information-Correlation paper you blogged about, which tones down its claims.

    For wildly implausible claims like ESP, the refutation belongs in the same journal, perhaps very abbreviated with the bulk online. If that journal refuses, another should go ahead.

  3. Joseph says:

    “On the other hand, this leads to an asymmetry where crap gets on the front page and the refutation doesn’t even get published on page B16.”

    I think the key here is that the decision to publish the original (shocking) finding is also a decision to publish replication and really needs to be. Otherwise, the literature will one done have a meta-analysis where the authors will state “in the past 30 years, 12 articles have been published on ESP, of which 11 have found positive results”.

    That suggests two things. That we should have a high bar for really unexpected findings. But we do need to let soem trhough because you never know — they might be true. The other is that a study that makes a strong and unexpected claim creates the market for “dog bites man” follow-up replication studies.

    With ESP this seems like a minor point. But imagine a study that shows heart medication treats paranoid schizophrenia; clearly it would be a bad thing if this result were never replicated and stood as the authority on the subject. Heck, we saw the harm that “Statins prevent cancer” papers could do — the replications based on RCTs were completely unsurprising but really important to prevent mistreatment of a lot of patients.

  4. Matt Weber says:

    I think I missed something. Doesn’t the fact that someone documented ESP phenomena invite attempts at replication, however anodyne we might find the failures of those attempts? The reason that no one’s publishing papers on failed attempts at cat-and-dog crossbreeding is that no one’s claimed to be able to crossbreed cats and dogs. Surely the man-bites-dog context changes the level of interest in dog-bites-man results. (Tooooo many dogs in that paragraph.)

    I guess I understand the logic that ESP is a bridge too far, and just not worth paying attention to no matter who published it or where. But I’m a psychologist, and I guess I feel like, as a field, we have a bit of an obligation to set this right, even if it means devoting a few pages to some extremely unsurprising non-results. (We also had an obligation not to pull this kind of crap in the first place, obviously, but that ship has sailed.) Anyway, I don’t see how specifically addressing the wildly implausible claims of an unusually high-profile paper opens the door to a never-ending flood of papers demonstrating things everyone already knows. Are the scientists you know actually all that interested in writing those papers?

  5. Jon M says:

    I’m not sure that’s a fair assessment of the situation. Normally it would not be news that ESP does not work but the previous publication of a paper claiming it does surely changes the assumption of the research record (although not individual scientists). I don’t think it is publishing dog bites man to publish non-replications of man bites dog studies that get published in influential places.

    If Nature published a new study showing that H2 + O2 does not yield H20 plus energy, I would certainly hope that a non-replication of this result would be swiftly published in order not to leave the impression that the controversial finding was a relatively uncontested result.

    I think this is particularly important for less outlandish claims. Suppose a journal publishes a finding saying there is no link between income and voting presumably using a less than ideal methodology. I would hope that journals would be willing to publish non-replications of this finding if only to demonstrate to people from outside the field that this is not widely accepted.

  6. The difference between the examples you give (dogs and cats etc.) and Ritchie et al’s paper is that Ritchie was responding to something that got published recently, and they submitted their nonreplication to the same journal that published Bem. So no, our leading journals shouldn’t be publishing “dogs and cats cannot produce offspring” — unless they have recently published a paper saying that they can.

    It’s an issue of journals being accountable for what they publish. That’s why I think journals should have a policy to publish exact (or very close) replications of studies that the journal originally published. It could be in an online supplement, it could be a brief report format, and it wouldn’t even need a full panel of reviewers (a single editor could review it). But in the same way that rigorous corrections policies at newspapers make reporters more careful, it might change the incentives for journal publishers and editors if they knew that they’d be responsible for publishing nonreplications of papers they’d accepted.

  7. revo11 says:

    I don’t think there’s a conflict here. The best scenario would have been to require Bem to provide more convincing evidence for his extraordinary claims, which should have led to them not being published. However, given the current circumstance that the article has been published, the issue is now elevated to “hypothesis under active scientific debate” (if there were such a thing as negative knowledge, the ESP paper would probably qualify) and a failure to replicate is a valid contribution to that debate.

    The ideal would be for a functioning publication process to screen these sorts of papers in the first place to keep fields from going backwards. This isn’t always possible, although in this case it should have been. There will probably be subtler instances where it’s not so obvious. To allow the field to self-correct you have to make these publication criteria symmetric, even if it seems ridiculous in this instance.

  8. Paul says:

    Here’s a link to New Yorker article I found simultaneously obvious and enlightening. It’s related to publishing non-replications and publication bias in general and how that can lead to dangerous consensuses. There’s also a good deal of evidence to support partial pooling of effects, though that’s not the stated purpose of the article.

  9. John Vokey says:

    Some journals that attempt to balance the situation (some with humour):

  10. Colin MIlls says:

    My feeling is that there is a bit of a difference in the size of the flake community in different disciplines. In the natural sciences people who believe that water molecules have memories or that homeopathy works tend to be sidelined as (mostly) harmless cranks and don’t darken the pages of the serious journals. In the social sciences there is a bigger constituency for the equivalent of that sort of thing and in some disciplines this is true to the extent that the flakes actually edit the journals (I’m not saying this is so in psychology – I don’t know one way or the other as I’m not a psychologist). So the ratio of signal to noise is much smaller in (some) social science journals and that, to my mind, is a good reason for carrying a higher proportion of “dog bites man” type articles.

    • Thom says:

      I’m not convinced – flaky science does get published in The Lancet, Science, Nature etc. (e.g. the memory of water stuff). The only journal(s) I know that appear to be edited by flakes are in ‘Physics’ (though I am sure there are some in other disciplines). All evidence suggests that the Bem piece got published because it passed muster at peer-review – not because the editors were flakes. (This is not to say it is a good paper – all peer review processes necessarily have false positives and peer review varies in quality).

  11. John Mashey says:

    I sympathize, especially since I’ve been a long reader and occasional author for Skeptical Inquirer.

    Slightly tongue-in-cheek:
    There’s always the Journal of Scientific Exploration, from which insight can be gained by a quick perusal of past articles. Most are freely available, although I am eagerly awaiting for “A Brief History of Abduction Research” to become free.

    As dogs and cats were mentioned earlier, JSE is my favorite journal for dog astrology, i.e., “An Empirical Study of Some Astrological Factors in Relation to Dog Behaviour Differences by Statistical Analysis and Compared with Human Characteristics.” Statistical analysis may be of interest to those who follow this blog.

    Every once in a while, a reasonable debunk somehow slips in, such as Balls of Light: The Questionable Science of Crop Circles by Italian skeptics.

    More seriously, if serious researchers notice that a journal prints junk, they can say so and send manuscripts elsewhere.

  12. Jess Riedel says:

    I believe that there are several journals such as PLoS ONE which do not base acceptance decisions on perceived importance of the results, only on the scientific rigor of the work. In addition, I know that PLoS ONE has essentially no restrictions about adherence to a scientific field.

    Even more amazing is that PLoS ONE has an impact factor of 4.411.

    Problem solved?

  13. Matthias says:

    I know it’s nitpicking but the other O isn’t really lost in the H2 + O2 reaction. Or is it?

  14. PSC says:

    Academic publications have two purposes:

    1) being interesting for other people to read, and
    2) providing career brownie points for the author

    Doing careful debunking work is a valuable part of the scientific enterprise – it should be worth career brownie points, even if no-one wants to read the details.

  15. Morgan Price says:

    “even in a computerized era where there are no page limits, there are still constraints on the time of editors and reviewers” — I think the journal that published the original paper should have an obligation to consider the non-replication. But this is not the existing norm.