Jeremy Fox points me to this article, “Underappreciated problems of low replication in ecological field studies,” by Nathan Lemoine, Ava Hoffman, Andrew Felton, Lauren Baur, Francis Chaves, Jesse Gray, Qiang Yu, and Melinda Smith, who write:

The cost and difficulty of manipulative field studies makes low statistical power a pervasive issue throughout most ecological subdisciplines. . . . In this article, we address a relatively unknown problem with low power: underpowered studies must overestimate small effect sizes in order to achieve statistical significance. First, we describe how low replication coupled with weak effect sizes leads to Type M errors, or exaggerated effect sizes. We then conduct a meta-analysis to determine the average statistical power and Type M error rate for manipulative field experiments that address important questions related to global change; global warming, biodiversity loss, and drought. Finally, we provide recommendations for avoiding Type M errors and constraining estimates of effect size from underpowered studies.

As with the articles discussed in the previous post, I haven’t read this article in detail, but of course I’m supportive of the general point, and I have every reason to believe that type M errors are a big problem in a field such as ecology where measurement is difficult and variation is high.

**P.S.** Steven Johnson sent in the above picture of a cat who is not in the wild, but would like to be.

A cat in the wild is called a lion.

Or a BobCat, Bob?

Leopards make up the bulk of big cats I think.

Nope. Lions are the bulked up big cats.

http://mylionclipart.com/design/clipart-of-a-cartoon-body-builder-lion-weightlifting-by-toonaday-346

Isn’t a cat in the wild just a Wildcat?

https://youtu.be/Bj1DZKOeZhI

How is Miscellaneous Statistics as a category different from Statistics?

A wild cat is an own species in Europe (Felis sylvestris): https://en.wikipedia.org/wiki/Wildcat

So the cat in the picture is just a wannabewild cat.

Oh nice. We do lot of microbial ecology studies that have low power due to sampling constraints. I think this will be helpful for framing our experimental designs.

For the ones who want to read the whole article:

https://www.researchgate.net/profile/Nathan_Lemoine/publication/305035987_Underappreciated_problems_of_low_replication_in_ecological_feld_studies/links/57d3316d08ae6399a38da1e2/Underappreciated-problems-of-low-replication-in-ecological-feld-studies.pdf

This paper brought up a question for me. I’ve been switching our research group over to direct modeling either using rstanarm or pystan. Should we still be reporting type M and S errors when using Bayesian models? Should we go back to old papers that used p-values and NHT and see what the type M and S errors look like?

I’ve been trying to just report marginal kernel density estimates for parameters, or occasionally some pair plots or whatever is appropriate. Mostly this is in work related to biology where I am consulting, and so it can be hard to explain to biologists who are mostly familiar with “p less than 0.05 so the effect exists”. The main thing of interest in Bayesian analyses is: “what are the credible values of the parameters” and a KDE shows that in a much more complete form than some pointrange or similar plots.

Thanks Daniel. That makes sense. I think I goofed up reporting the marginal kernel density estimates in our draft paper. I’ll go back and take a look. Most of my group is familiar with p less than 0.05 but they are open to Bayesian models since it shows them more.

I really like the concept of Type S and Type M error rates.

However, what strikes me is that in all applications (this paper, Andrews retro design function, …) the estimated standard errors are treated as the true standard errors. But in my understanding the standard errors are subject to over- or underestimation as well. Or do I miss something?

Paul:

Good point. It would be better to do a Bayesian analysis and average over the posterior distribution of the effect size and the standard error.