## Looking at many comparisons may increase the risk of finding something statistically significant by epidemiologists, a population with relatively low multilevel modeling consumption

To understand the above title, see here.

Masanao writes:

This report claims that eating meat increases the risk of cancer. I’m sure you can’t read the page but you probably can understand the graphs. Different bars represent subdivision in the amount of the particular type of meat one consumes. And each chunk is different types of meat. Left is for male right is for female.

They claim that the difference is significant, but they are clearly not!!

I’m for not eating much meat but this is just way too much…

Here’s the graph:

I don’t know what to think. If you look carefully you can find one or two statistically significant differences but overall the pattern doesn’t look so compelling. I don’t know what the top and bottom rows are, though. Overall, the pattern in the top row looks like it could represent a real trend, while the graphs on the bottom row look like noise.

This could be a good example for our multiple comparisons paper. If the researchers won’t cough up the raw data, we could just grab what we can from their graphs.

1. Nameless says:

Top row, second group from the right, is consumption of red meat by women, by quintiles. The highest quintile corresponds to an odds ratio of 1.48 with 95% CI 1.01 to 2.17. The corresponding odds ratio for the highest quintile in men is 1.27. This is the case where the effect would be considered quite significant by layman standards (a 27% to 48% increase in the risk of colon cancer vs. vegetarians – even though, as the article points out, that highest quintile is actually below average by American standards), but barely significant or not significant at all from the pure statistician’s point of view.

Next thing we know, someone might use these numbers and run an newspaper article claiming that “Japanese scientists fail to confirm (or even “disprove”) the link between red meat and cancer in men”.

I think that the top row is colon cancer, bottom row is rectal cancer, left is men, right is women, and, in each quadrant, we have, left to right, red+processed meat, red meat, and processed meat.

• Masanao says:

You are exactly right. Thanks for expanding on my short comings.

2. Jay says:

Oy, those colors.

3. Seems like a good place to mention this cartoon:

http://xkcd.com/882/

• Xi'an says:

You beat me to this reference, Bill!

• K? O'Rourke says:

Bill:

If I knew how to edit that picture of the newspaper page

I’ d add – “Other colours now known to be safe!”

but then I would be missing emphasising bias and would have to shout at myself ;-)

4. Phil says:

I had to grin at “If you look carefully you can find one or two statistically significant differences.” I don’t think looking for “statistically significant differences” is the right approach here, and I don’t think you do either.

I think your later comment about a trend is more like it. If eating meat causes some kind of cancer, I would expect that the more mean one eats, the more likely the cancer is; so I would think you would want to look for trends, not for individual high (or low) bins. The top row (colon cancer) appears to be consistent with a “significant” trend (meaning practically significant, not statistically significant), and indeed it looks like the best linear fit might imply a practically significant trend, although I wouldn’t be at all surprised if 0 trend is not far away as measured in standard errors. Given the other evidence that eating lots of meat can promote colon cancer in Europeans and Americans (see http://www.health.harvard.edu/fhg/updates/Red-meat-and-colon-cancer.shtml) I’d say this study suggests (perhaps mildly) that the same is true in Asians, if that’s where these data are from. It sure doesn’t prove it, though.

5. Andrea says:

Was it really necessary to report all those p-values for trends in the bottom row? Probably they did it for consistency (either you do it for all the analysis or you don’t do it at all), but sometimes I think it’s better to let the results for the single quintiles (in this case) speak for themselves, without reporting unnecessary p-values.

6. K? O'Rourke says:

Arghhh – where is the BIAS???
(formally assessed and its impact discussed)

Not really important if only one prespecified comparison with a p_value < .0001
(there was no randomization and hence no sensible null model from which to calculate that p_value)

7. shabbychef says:

While you are pointing out shortcomings in statistical process, Andrew, could I direct your attention towards the ‘Twitter predicts the stock market’ paper? (Bollen et. al., 2010, http://arxiv.org/abs/1010.3003 ) While the description of methodology is sufficiently vague to prevent replication anyway, there is plenty of evidence of suspect statistical process. For example, in Table II, 49 p-values are reported, with those significant at the 0.05 level given a star! This presumably after some ‘tuning of a number of parameters’. The media have reported this ‘result’ with adulation. I think a skeptical analysis is way past due.