Skip to content

Reasons for an optimistic take on science: there are not “growing problems with research and publication practices.” Rather, there have been, and continue to be, huge problems with research and publication practices, but we’ve made progress in recognizing these problems.

Javier Benitez points us to an article by Daniele Fanelli, “Is science really facing a reproducibility crisis, and do we need it to?”, published in the Proceedings of the National Academy of Sciences, which begins:

Efforts to improve the reproducibility and integrity of science are typically justified by a narrative of crisis, according to which most published results are unreliable due to growing problems with research and publication practices. This article provides an overview of recent evidence suggesting that this narrative is mistaken, and argues that a narrative of epochal changes and empowerment of scientists would be more accurate, inspiring, and compelling.

My reaction:

Kind of amusing that this was published in the same journal that published the papers on himmicanes, air rage (see also here), and ages ending in 9 (see also here).

But, sure, I agree that there may not be “growing problems with research and publication practices.” There were huge problems with research and publication practices, these problems remain but there may be some improvement (I hope there is!). What’s happened in recent years is that there’s been a growing recognition of these huge problems.

So, yeah, I’m ok with an optimistic take. Recent ideas in statistical understanding have represented epochal changes in how we think about quantitative science, and blogging and post-publication review represent a new empowerment of scientists. And PNAS itself now admits fallibility in a way that it didn’t before.

To put it another way: It’s not that we’re in the midst of a new epidemic. Rather, there’s been an epidemic raging for a long time, and we’re in the midst of an exciting period where the epidemic has been recognized for what it was, and there are some potential solutions.

The solutions aren’t easy—they don’t just involve new statistics, they primarily involve more careful data collection and a closer connection between data and theory, and both these steps are hard work—but they can lead us out of this mess.

P.S. I disagree with the above-linked article on one point, in that I do think that science is undergoing a reproducibility crisis, and I do think this is a pervasive problem. But I agree that it’s probably not a growing problem. What’s growing is our awareness of the problem, and that’s a key part of the solution, to recognize that we do have a problem and to beware of complacency.

P.P.S. Since posting this I came across a recent article by Nelson, Simmons, and Simonsohn (2018), “Psychology’s Renaissance,” that makes many of the above points. Communication is difficult, though, because nobody cites anybody else. Fanelli doesn’t cite Nelson et al.; Nelson et al. don’t cite my own papers on forking paths, type M errors, and “the winds have changed” (which covers much of the ground of their paper); and I hadn’t been aware of Nelson et al.’s paper until just now, when I happened to run across it in an unrelated search. One advantage of the blog is that we can add relevant references as we hear of them, or in comments.


  1. Actually Daniele Fanelli has reinforced the game changing value of the crisis in one of his Youtube videos. So I am not sure why that doesn’t come across. He broadens the query in a way that may not be sufficiently explanatory.

  2. Nicholas BROWN says:

    That article by Nelson et al. makes a couple of quite strong claims.

    The first is that there never was a file-drawer effect; the studies that should have gone into the file drawer were instead mostly p-hacked to significance and published. This makes a lot of sense, although I’m not sure how it can be fully tested.

    The second, near the end, is that psychology’s renaissance is currently underway. I find this to be rather premature. Judging by the (undoubtedly biased) selection of articles that I get asked to review, it doesn’t seem to me that what I will call for convenience “open science practices” are being adopted by anything other than a small minority of researchers. And in this struggle, the hard work is yet to come, because we have to persuade a lot of people that publishing fewer positive results is not only in the collective interest, it’s in their interest too. There are shades of the Prisoner’s Dilemma in this situation.

    • Both claims are garbage. Of course there are file drawers and some researchers have openly admitted dropping studies that didn’t work.

      And the improvement that we are seeing is selective (only social psych) and small.

    • Anonymous says:

      1) “The first is that there never was a file-drawer effect; the studies that should have gone into the file drawer were instead mostly p-hacked to significance and published”

      There never was a file-drawer effect, lol :) I was very disappointed with that Nelson et al. piece.

      When i was a student we had do design and execute a study in small groups under supervision of an “established” researcher. We performed the study with around 100+ pp (representative of that time), and found non-significant results which were (of course) file-drawered. I am sure we would have published it if the results were “significant”. So even during my “education” i was part of, and witnessed, file-drawering of studies. In fact all this happened with the 1st ever study i have even been part of. So, yeah there’s that…

      2) “And in this struggle, the hard work is yet to come, because we have to persuade a lot of people that publishing fewer positive results is not only in the collective interest, it’s in their interest too. There are shades of the Prisoner’s Dilemma in this situation.”

      Yes! If i were a researcher i would try and find others who would also want to do open science/best practices/etc. and form small groups of collaborators (using Studyswap for instance).

      I reason that once you pre-register, have higher power, want to publish possible null-results, etc. it actually starts to begin to matter on which prior studies/evidence/reasoning/etc. you base your hypotheses/new studies.

      I reason small groups of collaborators working on the same theory/phenomenon/etc. can help eachother via an idea i posted here:

    • Daniel Ozer says:

      My file drawer contains a few studies that DID work, but were not published (for a whole host of reasons, including Reviewer 2).

    • Austin Fournier says:

      Hold off a sec – he didn’t say there never was a file drawer, he said that it was considerably smaller than the number of successful p-hacked studies.

      As for how to estimate this, maybe a rough idea could be gleaned by estimating how many studies one can practically run in a year, given that it’s kind of a pain to get participants (or such is my impression – my research experience thus far is negligible so I could be wrong). But then again that would probably require an estimate of the number of false positives.

      I myself was encouraged to hear that number of people making pre-registrations was in the quadruple digits – I kind of felt like it was less than that. But yeah, there are pockets where people don’t even seem to have heard of replication crisis, let alone adopted “renaissance” techniques like pre-registration. I know because I live in one of them. God willing, that will be remedied pretty soon, but for now I find it hard to say the field as a whole has been transformed.

  3. Wonks Anoymous says:

    Your link to the Nelson article is broken.

  4. David Landy says:

    Just a quick rhetorical point: it’s not a crisis, if it’s not suddenly getting worse. The usual notion of a crisis is a problem that is newly crucial, newly worse, that is coming to a head. Here, we have many scientists who are newly discovering that the methods they were trained on aren’t great, and could be better–these scientists are experiencing personal/professional crises. If, though, as you say, we have situation where the epidemic has been raging for a long time, and we are rapidly fixing it–well, that’s just not a crisis.

    It’s like saying the AIDS crisis really started when we successfully implemented anti-retroviral therapy…

    So, let’s all call it the ‘replicability issue’?

  5. Jordan Anaya says:

    I think you could actually make an argument that things are getting worse.

    If p-hackers are more successful at getting tenure than non-p-hackers, then we currently have more p-hacking tenured professors than before. And these p-hacking professors teach their students how to p-hack, c.f. Wansink.

    This paper goes into much more detail:

    • Anonymous says:

      “I think you could actually make an argument that things are getting worse.

      If p-hackers are more successful at getting tenure than non-p-hackers, then we currently have more p-hacking tenured professors than before”

      I reasoned/wondered something similar along those lines here:

      “It has been suggested that a negative incentive-structure has possibly played a role in Psychological Science getting in this situation (e.g. Nosek, Spies, & Motyl, 2012). Researchers may have furthered their academic careers by publishing lots of new studies, perhaps irrespective of their truth-value or quality (cf. Higginson & Munafò, 2016). If a negative incentive-structure has been operating in Psychological Science, one could wonder what that could imply concerning those who have chosen to participate, and even flourished, in this environment. Would it be reasonable to assume that currently tenured Psychology Professors may have gotten there by:

      1) Publishing lots of low-quality papers (cf. Bakker, van Dijk & Wicherts, 2012; Higginson & Munafò, 2016)

      2) Engaging in what can be considered to be questionable research practices (cf. Agnoli, Wicherts, Veldkamp, Albiero & Cubelli, 2017; Fanelli, 2009; John, Loewenstein & Prelec, 2012)

      3) Not publishing their “failed” studies (cf. Fanelli, 2010; Fanelli, 2012)

      4) Rarely performing and/or publishing replications of their own or others’ work (cf. Makel, Plucker & Hegarty, 2012)

      5) Sabotaging others’ ability to use and check one’s work (cf. Anderson, Ronning, de Vries, & Martinson, 2007; Wicherts, Bakker & Molenaar, 2011)

      6) Inappropriately adding co-authors, and references, to their papers (cf. Fong & Wilhite, 2017)

      7) Unfairly receiving and/or demanding co-authorship (cf. Macfarlane, 2017)

      8) Wasting the general public’s money (cf. Ioannidis, 2012), abusing their trust, and throwing aside scientific values, principles, and responsibilities for personal gain: hereby providing a possibly more scientifically accurate description of what some have called a “research parasite” (cf. Longo & Drazen, 2016)

  6. Karim Naguib says:

    I think replication crises attest to the maturity of the disciplines now confronting them. Considering my own field (economics) I feel pessimistic that any such reflection on the validity of currently accepted practices is going to happen anytime soon. The mentality is still very much don’t replicate anything and just give me the p-values.

  7. Karim Naguib says:

    Thank you for sharing these, very helpful. However, I see little change in terms of p-hacking under the guise of doing a more thorough investigation of different aspects of a model or mechanism under study. For many the idea of presenting a posterior probability interval as opposed to a binary significant/not-significant decision is still unacceptable. Once a p-value passes the magical 5% significance, point estimates become the truth, uncertainty is forgotten and story telling begins.

    I don’t actually think it is a Bayesian vs Frequentist problem. It’s a problem with requiring simplicity to ensure robust inference. For example, this means ignoring the hierarchical design of studies during estimation but then torturing the standard errors to recover from clustering or testing multiple hypotheses (I’m talking about the most typical applied work). That means I usually end up with bad estimates and bad standard errors. We keep it absurdly simple at one end and make it incredibly complex at the other end; it’s very hard to get standard errors right and not many people understand what is going on. And if you got them right, you probably have no power to reject any null hypotheses in any reasonably sized study. It’s not surprising that Alwyn Young’s working paper “Channelling Fisher: Randomization Tests and the Statistical
    Insignificance of Seemingly Significant Experimental Results” finds so many problems. I don’t think the majority of us are very well trained in thinking statistically. For starters, we need to get comfortable discussing our research with probabilities and not simple yes/no answers.

  8. Anonymous says:

    From Fanelli’s paper:

    #”(…) but they echo beliefs expressed by a rapidly growing scientific literature, which uncritically endorses a new “crisis narrative” about science (an illustrative sample of this literature is shown in Fig. 1 and listed in Dataset S1)”

    # “Fig. 1.

    Number of Web of Science records that in the title, abstract, or keywords contain one of the following phrases: “reproducibility crisis,” “scientific crisis,” “science in crisis,” “crisis in science,” “replication crisis,” “replicability crisis.” Records were classified by the author according to whether, based on title and abstracts, they implicitly or explicitly endorsed the crisis narrative described in the text (red), or alternatively questioned the existence of such a crisis (blue), or discussed “scientific crises” of other kinds or could not be classified due to insufficient information (gray)”

    So i am a bit confused by this.

    He seems to have only read the title and abstracts of papers with words like “crisis” in the title/abstract if i am understanding things correctly, but somehow knows/infers/guesses (???) that these papers “uncritically” endorse a new “crisis narrative”?

  9. oliver says:

    Hi Andrew
    Istn’ that just a variation of Dara O’Briains point
    “the threat from zombies is going down, but the fear of zombies is on an all-time high?

Leave a Reply