Statistical vs. practical significance

Shravan asks,

Suppose I do a multilevel model analysis of repeated measures reaction time data and some predictor that I have for reaction time shows a significant HPD interval/p-value. But when I look at the estimate I find that the effect is about 1 millisecond. This means in effect that that particular effect only contributes a one ms slowdown to reaction time. But it’s significant statistically; sometimes it is hugely significant.

If one is just looking for whether the effect has an impact on reaction time, not the magnitude of the effect, should one declare evidence in favor of the effect? Or should one say, well, it *is* statistically significant, but what does a slowdown of 1 ms mean? Our hardware only samples events only 4 ms. Under this second view, I would be forced to say that in practice this is not an important effect.

A more general question related to the above came up in a review of a paper of mine recently. In our paper we had shown that a large effect that had been found previously in the literature becomes tiny (but remains significant) when we control word frequency. The reviewer objected that this finding may be important for computational modelers, who make quantitative predictions about that effect, what why should experimentalists care? I.e., why should people who only make qualitative (boolean) predictions about the effect care about the result, when the outcome is qualitatively the same?

My own understanding is that practical significance matters even in such theoretical explorations, and whether one implements a computational model or is reasoning qualitatively about data is irrelevant.

My reply:

In general, I agree that practical significance is what is important, with statistical significance being relevant to whether you can make a claim with confidence (e.g., the estimate 10 +/- 20 is potentially practically significant, but it is statistically insignificant, meaning that we can’t make any claim with confidence about its sign (or whether it actually practically significant).

So, yeah, I agree with you–but if you’re right about the practical significance in your example, you should be able to make the case on its own terms. For example, in your second scenario above, the onus is on you to explain why a difference of 0.1 (or whatever you found) is not practically significant–you have to translate it into “dollar terms,” or predictions, or whatever.

(This can be subtle. For example, consider binary outcomes. Going from a 50/50 to a 60/40 prediction can be a huge and important step. But if you look at root mean squared error, this takes you from 0.5 to 0.49–which looks like it’s can’t be significant in practice, even though it is. Ultimately, you have to take it back into the scientific or predictive context.)

2 thoughts on “Statistical vs. practical significance

  1. Lots of subtle points here, which means they require thought. Saying 'significant at 0.05' doesn't require thought (although the analysis itself may require lots of thought. So….why would anyone be willing (even eager) to spend money, time, and thought on a study and then leave the decision about its importance to a process that forbids thought?

  2. this kinda thing happens in bio-medicine alot with biomarker studies. Where some marker is significantly correlated with outcomes — but when you use it for classification of patients into different treatment groups, and construct ROC curves, you don't really get much better performance than using just standard clinical data.

Comments are closed.