Statistical significance as a guide to future data collection

The vigorous discussion here on hypothesis testing made me think a bit more about the motivations for significance tests. Basically, I agree with Phil’s comment that there’s no reason to do a hypothesis test in this example (comparing the average return from two different marketing strategies)–it would be better to simply get a confidence interval for each strategy and for their difference. But even with confidence intervals, it’s natural to look at whether the difference is “statistically significantly” different from zero.

As Phil noted, from a straight decision-analytic perspective, significance testing does not make sense. You pick the strategy that you think will do better–it doesn’t matter whether the difference is statistically significant. (And the decision problem is what to do, not what “hypothesis” to “accept.”)

But the issue of statistical significance–or, perhaps better put, the uncertainty in the confidence interval–is relevant to the decision of whether to gather more data. The more uncertainty, the more that can be learned by doing another experiment to compare the treatments. But if the treatments are clearly statistically significantly different, you already know which is better and there’s no need to gather more data.

In reality, this is a simplification–if you really can clearly distinguish the 2 treatments, then it makes sense to look at subsets of the data (for example, which does better in the northeast and which does better in the midwest), and continue stratifying until there’s too much uncertainty to go further. But, anyway, that’s how I see the significance issue here–it’s relevant to decisions about data collection.

3 thoughts on “Statistical significance as a guide to future data collection

  1. I just discovered your blog today and I am overjoyed. I have just finished teaching a group of graduate students from a 'radical' perspective on statistics which is just "common sense". It feels strange to meet another sensible person in the field these days…

  2. It might be, too, that you have done a statistical evaluation using your formal model, but you want to allow a role for a "hunch". If the rewards to both policies look pretty much the same, you'll go with the hunch, otherwise not.

    You could put that in as a prior instead, but some people (including me) would prefer to separate out the two steps.

  3. Andrew: "(And the decision problem is what to do, not what "hypothesis" to "accept.")"

    This is kind of wrong. First, though the decision of what to do may be binary (though I doubt it, as you suggest there can be mixed-mailing strategies), there is still confidence and cost/benefit to consider. If A is only slightly better than B (say $10 better, with p = .99), is it really as rational to adopt an all-A mailing stragegy than if A whollops B? If there is any cost at all for adopting A, I'd contend not.

    But second, and more interestingly, the debate on hypothesis testing here is more about research design than statistics.

    I agree that the specific example of catalogue A vs. B is a "what to do" decision. But can't we assume that the catalogue people are already planning catalogues C, D, and E? If they want to know why A is better than B, then they have testable hypotheses.

    The significance test is only useful if they made an a priori decision to conduct it, and designed your catalogues with the test in mind. The test's value is one of generalizability. If A and B are different in myriad ways, the test isn't very useful. But if A and B differ only on one dimension, then the test tells whether that that dimension is important.

    The p post hoc, the significance test is of little value. But that doesn't diminish the value of p in other circumstances.

Comments are closed.