Type M errors studied in the wild

Brendan Nyhan points to this article, “Very large treatment effects in randomised trials as an empirical marker to indicate whether subsequent trials are necessary: meta-epidemiological assessment,” by Myura Nagendran, Tiago Pereira, Grace Kiew, Douglas Altman, Mahiben Maruthappu, John Ioannidis, and Peter McCulloch.

From the abstract:

Objective To examine whether a very large effect (VLE; defined as a relative risk of ≤0.2 or ≥5) in a randomised trial could be an empirical marker that subsequent trials are unnecessary. . . .

Data sources Cochrane Database of Systematic Reviews (2010, issue 7) with data on subsequent large trials updated to 2015, issue 12. . . .

Conclusions . . . Caution should be taken when interpreting small studies with very large treatment effects.

I’ve not read the paper and so can’t evaluate these claims but they are in general consistent with our understanding of type M and type S errors. So, just speaking generally, I think it’s good to see this sort of study.

Along similar lines, Jonathan Falk pointed me to this paper, “On the Reproducibility of Psychological Science,” by Val Johnson, Richard Payne, Tianying Wang, Alex Asher, and Soutrik Mandal. I think their model (in which effects are exactly zero or else are spaced away from zero) is goofy and I don’t like the whole false-positive, false-negative thing or the idea that hypothesis tests in psychology experiments correspond to “scientific discoveries,” but I’m guessing they’re basically correct in their substantive conclusions, as this seems similar to what I’ve heard from other sources.

4 thoughts on “Type M errors studied in the wild

  1. Perhaps I can provide some historical background to this.

    When I had a research fellowship with Douglas Altman’s group in Oxford (2001-2003) I managed to write an SPLus program that could extract the raw data from the Cochrane Database of Systematic Reviews. When I discussed possible uses for this, some suggested we keep quiet about it until we had published as many papers using the resource as we could. Fortunately my REB in Ottawa had convinced me this sort of thing was inappropriate and we agreed to look at ways to share the resource.

    Unfortunately the agreement with the software provider at the time forbade the extraction and sharing of any material. So all we could do was bring others attention to how they could use an SPlus program to extract things themselves. But there was a bigger problem – the perhaps not to be surprising fact the the data quality was too just low to be usable – at scale.

    Not that there were inaccurate entries (though there surely were some on those) but reviewers had used various tricks to get stuff in they wanted to put in – in the wrong fields. This “fooled” the program I wrote about what the data was and that would have required a manual check of pretty much all entries. So it did not go anywhere after that – as far as I was aware.

    Its nice to see that it has gone much further (given ~ 15 years). I believe the software provider is different, trust quality is better and really hope pretty much anyone can get access to the extracted data?

    • …really hope pretty much anyone can get access to the extracted data?

      Yes, up to a point. In the end-of-paper notes on data sharing: “Raw data and analysis available on request from the authors.” The authors do seem to be engaging with the BMJ’s rapid responses so worth asking there for them to put the data and code on a website?

  2. Objective: To examine whether a very large effect (VLE; defined as a relative risk of ≤0.2 or ≥5) in a randomised trial could be an empirical marker that subsequent trials are unnecessary

    No, a large effect size does not mean you do not need to do replications… in fact it is totally irrelevant to whether you understand the conditions/methods well enough to communicate them effectively, or whether the phenomenon is stable to unanticipated differences found at other times/locations. I see they cite this paper as a motivation:

    We suggest that a sufficiently extreme difference between the outcome ranges for treated and untreated patients might be defined by two rules: (a) that the conventionally calculated probability of the two groups of observations coming from the same population should be less than 0.01 and (b) that the estimate of the treatment effect (rate ratio) should be large.

    http://www.bmj.com/content/334/7589/349?ijkey=b8716f4a83d475d6570738c0f01dbe4c6874babe&keytype2=tf_ipsecsha##

    Here we see the usual wrong definition of a p-value, missing the point of replication studies, and advocating for the statistical significance filter. I really think that most 8th graders have a better understanding of the scientific method than people who get trained in healthcare related fields, you end up unable to think clearly or understand the purpose of your own actions without exceptional effort and luck.

  3. Somewhat of a side point but +1 for consistently saying things like

    > I think their model (in which effects are exactly zero or else are spaced away from zero) is goofy and I don’t like the whole false-positive, false-negative thing

Leave a Reply to Anoneuoid Cancel reply

Your email address will not be published. Required fields are marked *