Skip to content
 

The difference between “statistically significant” and “not statistically significant” is not in itself necessarily statistically significant

The difference between “statistically significant” and “not statistically significant” is not in itself necessarily statistically significant.

By this, I mean more than the obvious point about arbitrary divisions, that there is essentially no difference between something significant at the 0.049 level or the 0.051 level. I have a bigger point to make.

It is common in applied research–in the last couple of weeks, I have seen this mistake made in a talk by a leading political scientist and a paper by a psychologist–to compare two effects, from two different analyses, one of which is statistically significant and one which is not, and then to try to interpret/explain the difference. Without any recognition that the difference itself was not statistically significant.

Let me explain. Consider two experiments, one giving an estimated effect of 25 (with a standard error of 10) and the other with an estimate of 10 (with a standard error of 10). The first is highly statistically significant (with a p-value of 1.2%) and the second is clearly not statistically significant (with an estimate that is no bigger than its s.e.).

What about the difference? The difference is 15 (with a s.e. of sqrt(10^2+10^2)=14.1), which is clearly not statistically significant! (The z-score is only 1.1.)

This is a surprisingly common mistake. The two effects seem sooooo different, that it is hard for people to even think that their difference might be explained purely by chance.

For a horrible example of this mistake, see the paper, Blackman, C. F., Benane, S. G., Elliott, D. J., House, D. E., and Pollock, M. M. (1988). Influence of electromagnetic fields on the efflux of calcium ions from brain tissue in vitro: a three-model analysis consistent with the frequency response up to 510 Hz. Bioelectromagnetics 9, 215-227. (I encountered this example at a conference in radiation and health in 1989. I sent a letter to Blackman asking him for a copy of his data so we could improve the analysis, but he refused, saying the raw data were on logbooks and it would be too much effort to copy them. We’ll be discussing the example further in our forthcoming book on applied regression and multilevel modeling.)

4 Comments

  1. Barry says:

    This is a good point, and a good kick-off to discuss power analysis. Point out that one of the researchers had a 'successful' experiment, while the other didn't, because of luck.

    I worry when somebody says that their data is unavailable, because it's in logbooks. At some

    point, it had to have been entered into a computer. And it should have been entered in sufficient detail to allow checking for errors.

  2. This always drives my students nuts when we discuss post hoc tests in one-way ANOVA: you know the smallest and largest group means must be significantly different, otherwise the overall ANOVA wouldn't be. But after that, almost anything goes.

    I agree with Barry, this is where you introduce the idea of power, and hope for the best.

  3. François says:

    I feel so frustrated: I did not catch it. Would you mind helping my slow brain to cope with those two paragraphs please?

    "Let me explain. Consider two experiments, one giving an estimated effect of 25 (with a standard error of 10) and the other with an estimate of 10 (with a standard error of 10). The first is highly statistically significant (with a p-value of 1.2%) and the second is clearly not statistically significant (with an estimate that is no bigger than its s.e.).

    What about the difference? The difference is 15 (with a s.e. of sqrt(10^2+10^2)=14.1), which is clearly not statistically significant! (The z-score is only 1.1.)"

    I know, I'm not a stats student in the beginning, so maybe I should read a textbook and then ask for help.

  4. Anonymous says:

    I am being driven nuts about "significance" and "non-significance". I see in one text, a number below p,.05 is significant and.68 and .46 are also significant.HELP!
    Confused