Skip to content

Fishing for cherries

Someone writes:

I’m currently trying to make sense of the Army’s preliminary figures on their Comprehensive Soldier Fitness programme, which I found here. That report (see for example table 4 on p.15) has only a few very small “effect sizes” with p<.01 on some of the subscales and nothing significant on the rest. It looks to me like it's not much different from random noise, which I suspect might be caused by the large N (and there's more to come, because N for the whole programme will be in excess of 1 million). While googling on the subject of large N, I came across this entry in your blog. My question is, does that imply that when one has a large N – and, thus, presumably, large statistical power – one should systematically reduce alpha as well? Is there any literature on this? Does one always/sometimes/never need to take Lindley’s “paradox” into account?

And a supplementary question: can it ever be legitimate to quote a result as significant for one DV (“Social fitness” in table 4) when it is simply (cf. p.10) an amalgam of the four other DVs listed immediately underneath it, of which one (“Friendliness”) has a significance of <.001 and the three others are NS? That just looks like double-dipping to me. They could add any number of superordinate meta-fitnesses and be improving on 100 dimensions. PS: If you find this interesting, I wonder if you might want to make a blog post out of it. CSF is a $140 million programme that has been controversial for all sorts of reasons. There's a whole bunch of other stuff that about this process, such as their use of MANOVA at T1 and "ANOVA with blocking" at T2, that makes me think they are on a fishing expedition for cherries to pick. For example, the means in some of the tables are "estimated marginal means" (MANOVA output), the SD values are in fact SEMs, and I have no idea why they are expressing effect sizes as partial eta squared when they only have one independent variable. But I'm a complete newbie to stats, so I'm probably missing a lot of stuff.

My reply: I followed the link. That report is almost a parody of military bureaucracy! But the issues you raise are important. The people doing this research have real problems for which there are no easy solutions. In short: none of the effects is zero and there’s gotta be a lot of variation across people and across subgroups of people. Also, there are multiple outcomes. It’s a classic multiple comparisons situation, but the null hypothesis of zero effects (which is standard in multiple-comparisons analyses) is clearly inappropriate. Multilevel modeling seems like a good idea but it requires real modeling and real thought, not simply plugging the data into an 8-schols program.

We have seen the same issues arising in education research, another area with multiple outcomes, treatments varying across predictors, and small aggregate effects.


  1. Joerg says:

    Just because I felt I had to procrastinate something else I had a look at that report. Two things caught my eye by skimming over the first few pages:

    a) they don’t seem to distinguish between percentages and percentage points (from page 14:
    “For example, if the Control condition had an Emotional
    Fitness score of 62.5, and the Treatment condition had
    an Emotional Fitness score of 65, POMP scores allow one
    to say that there was a 2.50% difference between the
    two conditions.”)

    b) What they label “SD” in table 4 on page 15, which in my book would mean “standard deviation”, is rather obviously the standard error of the mean.

    Not that the people who read this would probably ever care about this kind of stuff but that’s rather sloppy. Think about military standards! I would say 500 PUSH-UPS EACH!!! :)

    • Phil says:

      Show me a person who is capable of doing 500 pushups and I will show you a person who is not going to ordered around by the likes of me! Or you!

      • Joerg says:

        I thought that actually doing this was not the point in the military context. Rather, it’s the humiliation! I had a picture in mind where they started the pushups at 5pm in a humunguous mudhole, then getting stuck at No. 137 at 4AM, while constantly having somebody yell at them that they are not finished until No. 500!

  2. Bud Wiser says:

    Thank you Andrew. I’ve now met Phil. I think?

    Very good.

    Now you’re talking my language.