Inti Pedroso writes:

Today during the group meeting at my new job we were revising a paper whose main conclusions were sustained by an ANOVA.

One of the first observations is that the experiment had a small sample size. Interestingly (may not so), some of the reported effects (most of them interactions) were quite large. One of the experience group members said that “there is a common wisdom that one should not believe effects from small sample sizes but [he thinks] if they [the effects] are large enough to be picked on a small study they must be real large effects”. I argued that if the sample size is small one could incur on a M-type error in which the magnitude of the effect is being over-estimated and that if larger samples are evaluated the magnitude may become smaller and also the confidence intervals. The concept of M-type error is completely new to all other members of the group (on which I am in my second week) and I was given the job of finding a suitable ref to explain it. They acknowledge that the CI would decrease with large sample but not necessarily that the mean effect itself could be wrongly estimated. The group is formed by biologist some of which have good stat knowledge but there are several undegrads which are just starting. I was wondering if you know of an article describing M-type errors and how large they can be on small sample sizes?

My reply:

In increasing order of mathematical sophistication, see this blog post, this semi-popular article, and this scholarly article.

I think there’s room for more research on the topic.

The references are well worth reading but its sort of strange thing that is both

Subtle:

My old clinical boss without telting me why ask me to attend rounds given by his senior competitor who had recently acquired a Harvard Phd trained biostatistician. Their group had just found a statistically significant effect in a really small study.

Their lead claim was

“[if the effect was]large enough to be picked on [this small] a small study they [it] must be real large [huge]effects”

My response was neither subtle nor politically astute, but I was young.

Trivial:

Draw repeated small sample size samples from Normal(1,3) distribution.

Break into statistically significant versus not groups.

Look at what happens! (and then vary sample size, effect size, variances and perhaps do some plots)

Easier:

Draw repeated samples from any regression setting you like. Implement your favorite regression on each of them, and plot estimates of beta against corresponding estimates of standard error. Shade the area in which p0.05. Observe the difference in typical behavior of the estimate, comparing the two areas – or if you prefer, comparing one shaded region to the whole plot.

Repeat with different levels of power – use the same axes, shading, etc – and observe that the problem is negligible when you have lots of power, and terrible when you have next to no power.

@Inti; whether there is a serious bias problem (“regression toward the mean” a.k.a. “winners’ curse”) does depend on the power. Ask your biologists if they think the paper was lucky to get p<alpha for those interactions, or not. If they did an identical experiment, would they expect p<alpha again?

Some text eaten due to my careless use of “less than” signs;

“Shade the area in which p is less than 0.05 in one color. Shade the area in which p is greater than 0.05 another color.”

How general is this? I have done a few meta-analyses in ecology but haven’t seen that the effect size decreases with increasing sample size. Are there any papers showing this with “real data”?

John Ioannidis has published several real data examples in medicine, in the last few years. One is here;

http://jama.ama-assn.org/content/294/2/218.full

The effect has been understood a lot longer, though; it is known as regression to the mean, or winner’s curse, or the sophomore slump, or the file drawer problem; search by any of those terms and you should find examples.

Two other discussions that have appeared on this blog are worth juxtaposing with this one:

- the bias-variance tradeoff post from a few days ago: in that post, Andrew argued that complex models give effects for smaller subgroups, and these estimates have high variance and low bias. By contrast, here we are also talking about small samples but both high variance and high bias

- the zero-effect problem (can’t find the link): when trying to find near-zero effects, it is very likely that any effect declared statistically significant will be systematically over-estimated. By contrast, in this setting, their effect was not tiny but extremely large, but we also suspect over-estimation.

Why/how exactly have come to believe “their effect was not tiny but extremely large”???

Are you saying the suspected over-estimation could not account for how large it appeared?

I don’t believe blog comments need to be rigorously thought through, but…

The poster said that the sample size is small and yet the effect is large which causes some people to conclude the effect is significant. Ok I take back the word extreme. But if I’m reading the question right, the effect is large and that is the problem.

Increasing order of mathematical sophistication is a great norm for the presentation of ideas to a disparate readership.