Skip to content

Frustration with published results that can’t be reproduced, and journals that don’t seem to care


Thomas Heister writes:

Your recent post about Per Pettersson-Lidbom frustrations in reproducing study results reminded me of our own recent experience that we had in replicating a paper in PLOSone. We found numerous substantial errors but eventually gave up as, frustratingly, the time and effort didn’t seem to change anything and the journal’s editors quite obviously regarded our concerns as a mere annoyance.

We initially stumbled across this study by Collignon et al (2015) that explains antibiotic resistance rates by country level corruption levels as it raised red flags for an omitted variable bias (it’s at least not immediately intuitive to us how corruption causes resistance in bacteria). It wasn’t exactly a high-impact sort of study which a whole lot of people will read/cite but we thought we look at it anyways as it seemend relevant for our field. As the authors provided their data we tried to reproduce their findings and actually found a whole lot of simple but substantial errors in their statistical analysis and data coding that lead to false findings. We wrote a detailled analyis of the errors and informed the editorial office, as PLOSone only has an online comment tool but doesn’t accept letters. The apparent neglect of the concerns raised (see email correspondence below) led us to finally publish our letter as an online comment at PLOSone. The authors’ responses are quite lenghty but do in essence only touch on some of the things we criticize and entirely neglect some of our most important points. Frustratingly, we finally got an answer from PLOSone (see below) that the editors were happy with the authors’ reply and didn’t consider further action. This is remarkable considering that the main explanatory variable is completely useless as can be very easily seen in our re-analysis of the dataset (see table 1 ).

Maybe our experience is just an example of the issues with Open-Access journals, maybe of the problem of journals generally not accepting letters, or maybe just that a lot of journals still see replications and criticism of published studies as an attack on the journal’s scientific standing. Sure, this paper will probably not have a huge impact, but false findings like these might easily slip into the “what has been shown on this topic” citation loop in the introduction parts.

I would be very interested to hear your opinion on this topic with respect to PLOS journals, its “we’re not looking at the contribution of a paper, only whether its methodologically sound” policy and open access.

My reply: We have to think of the responsibility as being the authors’, not the journals’. Journals just don’t have the resources to adjudicate this sort of dispute.


  1. Not even by mathematically proving that a published result is a fraud some prestigious journals refuse to acknowledge. Check this:

  2. Larry Raffalovich says:

    Journals can send comments out for peer review, require authors to respond, and publish both. Major sociology journals do this.

    • numeric says:

      My reply: We have to think of the responsibility as being the authors’, not the journals’. Journals just don’t have the resources to adjudicate this sort of dispute.

      So do some statistical journals (American Statistician, for example).

  3. A.P. Salverda says:

    My experience is that publication of a critique which is intended to help improve the quality of the scientific literature can, paradoxically, result in publication of a reply in which a journal not only allows authors to deny any wrongdoing–on the basis of arguments that are clearly fallacious–but also to introduce additional errors. In my opinion, this course of action does a disservice to the field.

    I wrote a commentary on a paper that appeared in Journal of Experimental Psychology: Learning, Memory, and Cognition (an APA journal). The paper contains a number of serious statistical errors, as a result of which (I argued) it does not contain any substantive evidence for the main conclusion. One of the errors is the infamous mistake to interpret a difference in statistical significance as a statistically significant difference; Experiment 1 yielded an effect that was statistically significant, and Experiment 2 didn’t. Therefore (the authors concluded), there was a difference in effect between the two experiments. (The journal asked me to perform the required analysis on the authors’ data, which I did, and which is included in my commentary. The analysis failed to provide evidence for an interaction between experiment and effect.)

    My commentary received positive reviews and was accepted for publication at the beginning of this year. The authors were invited to write a reply, which will be published along with my criticism. In their reply, the authors deny that they made any of the statistical errors (e.g., they provide what they believe is a legitimate “rationale” for drawing conclusions on the basis of a difference in statistical significance), introduce new errors, and present a “reanalysis” of their original data in which they move the goal posts far enough to obtain a p-value just below .05. They view this result as providing “additional evidence” for their original claim. I was invited to review the first submission of their reply, but the authors chose to ignore the vast majority of my lengthy and detailed comments.

    To my surprise, the authors’ reply was accepted for publication. I contacted the journal and expressed my concern that they accepted for publication a reply that contains basic statistical errors, invalid motivations for those errors, and additional errors. I pointed out that such errors contribute to the replication crisis in psychological science. The editor replied and said that it was up to the readers of the journal to “dispassionately weight the arguments” presented in my criticism and the authors’ reply. I wrote back and emphasized that violation of the rules of null-hypothesis significance testing is not a matter of personal opinion. Soon after, the editor effectively ended our conversation.

    For those of you who are interested: my criticism and the reply are scheduled to appear in the December issue of the journal.

    • Andrew says:


      Yes, this is an example of the “incumbency advantage” by which previously published papers get the benefit of the doubt.

    • Florian Wickelmaier says:


      Quite a sad story, but I think, worth the effort; someone might learn from it. Reminded me of my own experience.

      Lieberman et al.’s reply is both hair-raising and amusing. To me, the worst part (out of many highlights) is this:

      “Salverda states that the native signers and late learners had the same pattern of results considering the speed with which they initiated gaze shifts to the target picture. In the phonological and unrelated conditions, the difference for native signers in saccade latency to the target was 40 ms (884 ms vs. 844 ms, respectively); for the late learners, the difference was only 13 ms (876 ms vs. 863 ms, respectively). While the range is narrow in both groups, for the late learners this narrow range is smaller than the standard error of the mean (SE = 22.2 ms for native signers and SE = 22.6 ms for late learners), which suggests that late learners were not qualitatively sensitive to the conditions.” (p. 2004)

      As if they were mocking Gelman and Stern (2006):

      “Consider two independent studies with effect estimates and standard errors of 25 +/- 10 and 10 +/- 10. The first study is statistically significant at the 1% level, and the second is not at all statistically significant, being only one standard error away from 0. Thus, it would be tempting to conclude that there is a large difference between the two studies. In fact, however, the difference is not even close to being statistically significant: the estimated difference is 15, with a standard error of sqrt(10^2 + 10^2) = 14.” (p. 328)

      In reply to my comment (above), a reviewer wrote:

      “The comment has limited applicability for the Costa et al. paper, mainly because the argument does not hold for most of the studies. For example, in the second Asian disease study there was a significant framing effect for NL but not for FL. It is perfectly valid to conclude that the framing effect was reduced in FL (if not that it disappeared).”

      Some people (authors, editors, reviewers) just don’t get it. On the positive side, they are contributing great teaching examples, and my students enjoy them!

      • Andrew says:


        I followed the links and saw this in that letter from the journal editor:

        You have convinced me that there’s a serious problem with Costa et al.’s analysis. But I also remain convinced by his subsequent analyses that he has a real effect. I think your suggestion to have him write an erratum (or a corrigendum) was an excellent one. I’ve been in touch with him and, after some back and forth, I’ve allowed him 10 days to produce a short corrigendum. I’ve asked him to acknowledge your input in his text. If he does not come through by Fri, May 8, my plan is to give you the opportunity to write a very short note reminding everybody of the statistical issue, perhaps mentioning Costa et al. as well as any other recent perpetrators of the error, without attempting to undermine the findings of their paper.

        Fri, May 8, has come and gone. What happened with that? Did the author publish a correction?

        • Florian Wickelmaier says:

          Yes, they did (see the link in the original post). But I never got the chance to comment on their correction. It introduces a meta-analysis (of their own five experiments), and pooling over all data, they squeeze out significance. In a way, similar to what AP reports.

  4. Tom Passin says:

    Many years ago I took a wonderful class in nuclear physics from Robley D. Evans. He told us about a pre-war (i.e., 1930s) situation in the field of mass spectroscopy. It seems a researcher published a paper in which he claimed that a particular isotope of lead didn’t exist (his lab had its own mass spectroscope). Soon after, another group published test results that definitively established the existence and mass of that very isotope.

    The first experimenter then published a paper confirming the isotope’s existence and giving its mass with ten times better precision. Hmm…

    Prof. Evans would never tell us the man’s name, but he said that all the researchers in the field knew about him and paid no attention to his papers.

  5. jrc says:

    I agree that arguing in good faith is the responsibility of the author and not the journal. But that both lets the journals off a bit easily, and (given the reality of academic politics and promotions) puts an incredible burden on the original author. Let me start with a totally uncontroversial statement that somehow still feels like playing devil’s advocate:

    I think there are times when the original journal should not be compelled to publish a replication paper or comment, even one that should be published elsewhere. Not all mistakes are created equal; not all corrections need to appear with the original article. So when should a particular journal publish a replication of an original paper? What about this as an idea:

    1. Mistakes. These can be either statistical mistakes (coding errors, data problem) or mathematical/logical mistakes (lost an inverse somewhere; commits fallacy). If a paper’s main result is entirely due to a coding error, data problem or logical fallacy, that paper should be retracted without animus. The original author and replicator then publish in place of the retracted paper a paper with the corrected information/text/explanation of problem. This is not a new publication, but now sits in place of the original in the journal’s history and now includes the replicator/commenter’s name, who gets full (co-)authorship credit.

    2. Replications and Re-Analyses. These can be experimental replications where an intervention is re-run on new subjects; or a statistical reanalysis of the original underlying data; or a comment on the generalizability or robustness of some result. I don’t think journals should have to publish every comment or replication or re-analysis of work they’ve published in the past, not even every “valid” one. The optimal (for science) level of comment-publishing (even only among the “valid” ones) is definitely above 0 and above where we are, but it is also surely below “all of them published at top journals.” My gut here is that the best way to get good re-analysis published is for us to both demand good re-analysis and simultaneously increase the costs to journals for refusing to correct the record*. We can cite any journal that wants to publish good re-analyses, whether it published the original paper or not. And, you know, maybe we get on the internet make a little fun of journals that consistently refuse to correct their past mistakes.

  6. Jake says:

    At least the Authors submitted their data, and the comments are publicly viewable, I think this is progress!

  7. Jack Wilkinson says:

    My experience with PLOSone so far is that they talk the talk but can’t walk the walk. They talk about their guiding principle being to value any science where the results and conclusions follow from the methods used. The problem is, the people making these decisions don’t have the methodological competence to determine when this is and is not the case.

  8. Thomas says:

    I agree with Jake that the openness of both the data and the comments is an improvement, and I agree with Andrew that it’s the responsibility of the authors to request that their paper be retracted if the errors are serious enough. The journal can’t be expected to adjudicate every dispute in the comments section to a paper, but it’s to the discredit of the authors if they can’t recognize a fundamental flaw in their analysis and acknowledge it. (I haven’t looked at this particular case, so I don’t know how serious it is here.)

    Sometimes, however, journals actually suppress critical comments on their online platforms. While that may of course sometimes be as necessary as retracting a paper, it’s something that a journal should only do after careful deliberation.

  9. Anonymous says:

    Total nonsense! If a journal publishes a paper, then it, its editors and the publisher take credit, even more so when they play the impact factor game, so they too share responsibility for a published paper. Authors have responsibility, but so too have editors (and thus the journal and publisher). Enough of this hand-off approach by editors and journals. If they don’t have the resources to deal with this modern phenomenon, then they have no right publishing a journal!

  10. David C. Norris says:

    This [erratum]( is an example of what must surely be a resource-limited journal (*Health Affairs*, published by ProjectHOPE) going the extra mile to secure 3 fresh reviewers to adjudicate a dispute over statistical method.

  11. Jordan Anaya says:

    In terms of getting these reanalyses published, I think we should just take journals out of the equation and post the analyses as a preprints.

Leave a Reply