Don’t let your standard errors drive your research agenda

Posted on February 1, 2013 9:08 AM by Andrew

Alexis Le Nestour writes:

How do you test for no effect? I attended a seminar where the person assumed that a non significant difference between groups implied an absence of effect. In that case, the researcher needed to show that two groups were similar before being hit by a shock conditional on some observable variables. The assumption was that the two groups were similar and that the shock was random. What would be the good way to set up a test in that case?

I know you’ve been through that before (http://statmodeling.stat.columbia.edu/2009/02/not_statistical/) and there are interesting comments but I wanted to have your opinion on that.

My reply: I think you have to get quantitative here. How similar is similar? Don’t let your standard errors drive your research agenda. Or, to put it another way, what would you do if you had all the data? If your sample size were 1 zillion, then everything would statistically distinguishable from everything else. And then you’d have to think about what you really care about.

12 thoughts on “Don’t let your standard errors drive your research agenda”

Pingback: snippets | re-musing
Christian Hennig on February 1, 2013 10:50 AM at 10:50 am said:

Alexis: You could specify a threshold on the difference that in terms of interpretation means “no substantially meaningful difference” – for example |\mu_1-\mu_2|=\epsilon against the alternative “smaller” (this can often be done with a minor modification of standard tests). Significance than gives you evidence *in favour* of “no (big) difference”.

Depending on \epsilon, this may require quite large samples, though.
Christian Hennig on February 1, 2013 10:52 AM at 10:52 am said:

Somehow a bit of my posting was eaten. Here is the complete one.

Alexis: You could specify a threshold on the difference that in terms of interpretation means “no substantially meaningful difference” – for example |\mu_1-\mu_2|=\epsilon against the alternative “smaller” (this can often be done with a minor modification of standard tests). Significance than gives you evidence *in favour* of “no (big) difference”.

Depending on \epsilon, this may require quite large samples, though.
Carlisle Rainey on February 1, 2013 10:52 AM at 10:52 am said:

I have a paper that readers might find interesting at http://crain.co/nme
Christian Hennig on February 1, 2013 10:54 AM at 10:54 am said:

Again! Apparently the system gets confused with “larger”/”smaller signs? I make a third attempt. Sorry! The first two can go.

Alexis: You could specify a threshold on the difference that in terms of interpretation means “no substantially meaningful difference” – for example |\mu_1-\mu_2| “smaller” \epsilon, and then you could test |\mu_1-\mu_2| “larger or equal” \epsilon against the alternative “smaller” (this can often be done with a minor modification of standard tests). Significance than gives you evidence *in favour* of “no (big) difference”.

Depending on \epsilon, this may require quite large samples, though.
- Corey on February 1, 2013 12:06 PM at 12:06 pm said:
  
  The system thinks that < is the start of an html tag, so everything after it looks like a syntax error and gets wiped. To make a < glyph, type “<”.
- Rahul on February 1, 2013 1:22 PM at 1:22 pm said:
  
  That triple-post was funny! :) \< /<
- Georgette on February 6, 2013 8:13 AM at 8:13 am said:
  
  This is called equivalence testing. It is common practice, especially in biological/medical research where there is a boundary of interest to this. For example most generic drug comparisons are set up as a comparison for the log of the ratio to be within log(.8),log(1.25). There is an extensive literature on this but it doesn’t seem to get out of its biopharma niche.
  
  I think Andrew would agree that equivalence is one of those frequentist techniques that takes on Bayesian properties such as using prior knowledge of the boundary. Also there is Bayesian equivalence testing.
  - Christian Hennig on February 6, 2013 10:21 AM at 10:21 am said:
    
    Georgette: How is using prior knowledge of the boundary Bayesian?
Martha Smith on February 3, 2013 12:58 AM at 12:58 am said:

Yes, Yes, Yes: “you have to get quantitative here. … Don’t let your standard errors drive your research agenda… you’d have to think about what you really care about,” — except I’d change
“you’d have to” to “you have to,” no matter what your sample size.

It seems to be the rule rather than the exception that papers draw conclusions on the basis of “statistical significance” (often even at the 0.1 level, often with multiple testing involved but its effect ignored) with either no discussion of power or something like, “a power calculation showed that this sample size will give a medium effect size,” (presumably referring to something like Cohen’s d, not a raw effect size — and with no discussion of the effect of multiple testing on power.) So rarely do I see discussion of what raw effect size is practically significant. There is no indication of thinking, just turning a crank and magically expecting a meaningful (“significant” in a magical way) result.
- K? O'Rourke on February 6, 2013 8:43 PM at 8:43 pm said:
  
  And when they try to address this less niavely, they make things worse
  
  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1495062/
  
  I did my best, but the other authors here wanted to recommend another form of publication bias – only narrow intervals should be published in journals…
Alexis on February 4, 2013 5:44 PM at 5:44 pm said:

Thank you Andrew fr sharing it. And thank you for the thoughts in the comment section.

Comments are closed.