A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.
I agree completely. In my terminology, with small sample size, the classical approach of looking for statistical significance leads to a high rate of Type S error. indeed this is a theme of my paper with Weakiem (along with much earlier literature in psychology research methods). I’d love this stuff even more if they stopped using the word “power” which unfortunately is strongly tied to the not-so-useful notion of statistical significance. Also I didn’t notice if they mentioned the statistical significance filter—the problem that statistically-significant results tend to have high Type M errors. In any case, it’s good to see this stuff getting further attention. Also I think it would be useful for them to go further and provide guidance into how to better analyze data from small samples. Saying not to design low-power studies is fine, but once you have the data there’s no point in ignoring what you have.