Weisburd’s paradox in criminology: it can be explained using type M errors

This one (with Torbjørn Skardhamar, and Mikko Aaltonen) goes out to all the criminologists out there. . . .

Here’s the story:

Simple calculations seem to show that larger studies should have higher statistical power, but empirical meta-analyses of published work in criminology have found zero or weak correlations between sample size and estimated statistical power. This is “Weisburd’s paradox” and has been attributed by Weisburd, Petrosino, and Mason (1993) to a difficulty in maintaining quality control as studies get larger, and attributed by Nelson, Wooditch, and Dario (2014) to a negative correlation between sample sizes and the underlying sizes of the effects being measured. We argue against the necessity of both these explanations, instead suggesting that the apparent Weisburd paradox might be explainable as an artifact of systematic overestimation inherent in post-hoc power calculations, a bias that is large with small N. Speaking more generally, we recommend abandoning the use of statistical power as a measure of the strength of a study, because implicit in the definition of power is the bad idea of statistical significance as a research goal.

8 thoughts on “Weisburd’s paradox in criminology: it can be explained using type M errors

  1. While I don’t dispute some of the general scientific malpractice contributing here, surely this has to be partly related to larger sample sizes more often being surveys while small samples are typically experiments, right? There are good reasons to expect effect sizes to be larger in experimental demonstrations of the same underlying concept in a lot of situations. Multicollinearity, uneven distributions of the most/least affected, etc. all could contribute.

    • No that’s no the case (that the larger sample sizes are observational). Almost none of that work collects data from people — and of course a survey can be used to collect experimental data anyway.

  2. It seems like measuring statistical power was useful in bringing to light this stylized fact, the problem was just the initial interpretation of that finding.

    • Yes, I think there was an interesting observation made, more than 25 years ago, just not really an analysis of the underlying possibility that it was basically an artifact of the process (and no real claim to have done so either). And the observation itself is true and something applied researchers need to be aware of. But this is all a bit before the issues of M errors and false discovery really reached the level of consciousness that exists now. Actually asking for a research design to include a power analysis was a new thing, I don’t think the dangers of post hoc power analysis were necessarily considered either. I just think in the mid 1990s people were still thinking about things in a way that is different than now. Remember the initial version of R was not until 1995 and the stable version in 2000, so it was just a very different world. I can remember having a floppy disk with a program that did calculations based on Cohen’s work.

  3. > selection in what studies are published and in what results are presented within any published study—will make it extremely difficult to study
    such patterns using the published record.
    Agree, modeling selection based on the published record has been and likely will continue to be hopeless.

  4. I really liked this paper. I think it’s one of your the best so far at explaining type S and M errors. I also liked that it focuses on a very particular phenomena in one field but could obviously apply to most science literature.

Leave a Reply

Your email address will not be published. Required fields are marked *