Jimmy points me to this article, “Why most discovered true associations are inflated,” by J. P. Ioannidis. As Jimmy pointed out, this is exactly what we call type M (for magnitude) errors. I completely agree with Ioannidis’s point, which he seems to be making more systematically than David Weakliem and I did in our recent article on the topic.

My only suggestion beyond what Ioannidis wrote has to do with potential solutions to the problem. His ideas include: “being cautious about newly discovered effect sizes, considering some rational down-adjustment, using analytical methods that correct for the anticipated inflation, ignoring the magnitude of the effect (if not necessary), conducting large studies in the discovery phase, using strict protocols for analyses, pursuing complete and transparent reporting of all results, placing emphasis on replication, and being fair with interpretation of results.”

These are all good ideas. Here are two more suggestions:

1. Retrospective power calculations. See page 312 of our article for the classical version or page 313 for the Bayesian version. I think these can be considered as implementations of Iaonnides’s ideas of caution, adjustment, and correction.

2. Hierarchical modeling, which partially pools estimated effects and reduces Type M errors as well as handling many multiple comparisons issues. Fuller discussion here (or see here for the soon-to-go-viral video version).

P.S. Here’s the first mention of Type M errors that I know of. The problem is important enough, though, that I suspect there are articles on the topic going back to the 1950s or earlier in the psychometric literature.

http://www.stat.columbia.edu/~cook/movabletype/ar…

i'd like to point back to an older blog entry about power, because what i believe andrew means by retrospective power calculations is not what many people perceive those to be. (please correct me if i am wrong!) see the comments.

In Ioannidis' work, usually selection effects are the strongest – selective publication (preferably publishing statistically significant studies), selection of promising features within the publication, and selectively getting published earlier if dramatic.

In your paper with David Weakliem you seemed to back off on these problems almost suggesting journals should only publish statistically significant studies ???

Jimmy – thanks for the back post – think Radford nailed it re: retrospective power calculations – I have been looking for years for a published article I can quote that presents the same argument.

Keith