Here’s a theoretical research project for you

We were having a listserv discussion on the replication project in psychology and someone asked about the rate of replication failures of stunning claims at top journals, compared to run-of-the-mill claims at lower-impact journals.

E. J. wrote:

Boring research is more likely to replicate. I have no data to back this up, so let’s just say it’s my prior belief.

My response was that his statement could be formalized. The basic idea is “surprising” = “low prior probability” (that’s what “surprise” means, right), hence less likely to replicate. But it’s not trivial, because Pr(replicate) depends also on the power of the study. But, again, the surprising results are likely to have low power. It could be worth trying to set up a simple model of this. No big deal but this could maybe have some impact!

8 thoughts on “Here’s a theoretical research project for you

  1. I’m not so sure. There could also be a correlation between boring and sloppiness too. Lots of possible confounders.

    IMO the claim that top journals do worse at replication is not true. Waiting for someone to do an empirical test, but I’m sure that would be hard.

  2. Wait, didn’t the replication project look at how the replicability of results varied as a function of the journal in which they were published, and as a function of some measure of the “surprisingness” of the results? Or is my memory going? Or is the issue that the replication project only looked at studies in top journals and didn’t consider any studies with really unsurprising results?

    • I don’t know what the issue is, but yeah on the article level, surprisingness predicted low replicability in the RPP. All journals were top journals in psychology, but from different sub-disciplines, so you can’t really compare them well. The N-pact paper looked at more journals and found a negative relation between impact factor and average sample size per journals _within_ a subdiscipline.

  3. To me, science is the business of reducing entropy in conceptual space. [Where ‘entropy’ might be considered unconnected and disordered facts, data, observations and phenomena]. Consider the ratio of how much conceptual entropy Newton’s Laws generated to how much entropy they reduced — they’ve been called the single greatest abstraction of the human mind, and they explain an overwhelming amount of variation in the motion of the planets. What makes something ‘boring’ or ‘not-boring’ usually is due to press releases and hype. This begs the question: were newton’s laws ‘surprising’? wouldn’t they seem obvious to experts in the field who weren’t just being conservative for tradition / authority’s sake?

  4. Apologies in advance for the notation:

    Suppose researchers decide which studies to carry out by randomly sampling from a population of studies. Reduce each such study to a classical hypothesis test. The null hypothesis is true in a random study with probability n. Assume every study has power p and size a. A study is published iff the null hypothesis is rejected. A published study is replicated iff the null hypothesis is rejected in an independent sample. The replication rate, r, is the probability of a study being replicated, given that it has been published.

    r = P[replicated | published] = P[H0 rejected in second study | H0 rejected in first study] = P[H0 rejected in second study, H0 true | H0 rejected in first study] + P[H0 rejected in second study, H1 true | H0 rejected in first study] = P[H0 rejected in second study | H0 true, H0 rejected in first study] * P[H0 true | H0 rejected in first study] + P[H0 rejected in second study | H1 true, H0 rejected in first study] * P[H1 true | H0 rejected in first study] = P[H0 rejected in second study | H0 true] * P[H0 true | H0 rejected in first study] + P[H0 rejected in second study | H1 true] * P[H1 true | H0 rejected in first study] = a * P[H0 true | H0 rejected in first study] + p * P[H1 true | H0 rejected in first study] = a * (P[H0 rejected in first study | H0 true] * P[H0 true] / P[H0 rejected in first study]) + p * (P[H0 rejected in first study | H1 true] * P[H1 true] / P[H0 rejected in first study]) = a * (a * n / (a * n + p * (1 – n))) + p * (p * (1 – n) / (a * n + p * (1 – n))) = (a^2 * n + p^2 * (1 – n)) / (a * n + p * (1 – n))

    Suppose further that there are two populations of studies: boring studies, and surprising studies. Studies in both populations employ the same p and a, but n is higher in surprising studies (n1) than in boring studies (n0). The relative risk of replication for published boring studies versus published surprising studies is:

    r0 / r1 = [(a^2 * n0 + p^2 * (1 – n0)) * (a * n1 + p * (1 – n1))] / [(a^2 * n1 + p^2 * (1 – n1)) * (a * n0 + p * (1 – n0))]

    As a special case, let a = 0.05 and p = 0.80. Then (approximately):

    r0 / r1 ~= [(1 – n0) * (0.51 – 0.48n1)] / [(1 – n1) * (0.51 – 0.48n0)]

    which is greater than 1 everywhere (since n1 > n0), and is increasing in n1 for fixed n0.

  5. That was my impression many years ago with reviewing a large number of Cochrane meta-analyses – boring here being the more run of the mill clinical research questions that would unlikely have major impacts on clinical practice and replication being similarity of effect estimates over trial within a meta-analysis.

    The idea being in low key areas folks were working more comfortably and carefully.

Leave a Reply

Your email address will not be published. Required fields are marked *