Arho Toikka writes:
I ran across what I feel is a pretty peculiar use of statistical significance and p-values, and thought I’d send you a message and see if you find it interesting too or if I’m just confused about something:
I read a news story about a study that showed that previous studies on moderate alcohol use and its association with lower mortality have been false – biased by erroneous selection of the control group (as they are observational studies). Being a risk-averse moderate alcohol user and knowing how news media treats press releases, I had to go and read the study.
Here’s the reference:
Stockwell et al. (2016). Do “Moderate” Drinkers Have Reduced Mortality Risk? A Systematic Review and Meta-Analysis of Alcohol Consumption and All-Cause Mortality. Journal of Studies on Alcohol and Drugs, 77(2), 185–198 (2016).
They put forward three meta-analytic models to claim:
Estimates of mortality risk from alcohol are significantly altered by study design and characteristics. Meta-analyses adjusting for these factors find that low-volume alcohol consumption has no net mortality benefit compared with lifetime abstention or occasional drinking.
Thus, for each of the three strategies, evidence for reduced mortality risk among low-volume drinkers largely disappeared once design and methodological issues were controlled for directly in the analysis or by study selection.
One of their strategies was to do a stratified meta-analysis for four groups, one with the correct control group and three with different biases (table 4 in the paper). The risk ratios for moderate drinking were 0.9 (vs true abstainers), 0.91 (vs abstainers + those who quit drinking + those who drink very little), 0.86 (vs. abstainers + those who quit drinking) and 0.86 (vs. abstainers + those who drink very little).
At face value, it seems that changing the control group did little to change the effect. But there are only 13 studies with the correct control group! And this brings the p-value to .19, widens the confidence interval to 0.72 to 1.06.
To me, their claims do not follow from the data. At best, it seems that they should say that this strategy shows little or no difference between the studies with different control groups, and too few studies with the correct one are in to conclude much.
Am I missing something? Isn’t this arguing absence of evidence as evidence of absence? Is this an appropriate use of frequentist inference? Is this a common use?
My reply: I guess I’d recommend that the researchers step back and be willing to acknowledge uncertainty. Even if results aren’t statistically significant, they can still inform decisions. But I don’t have any clean answers here. I’m not sure exactly what I would do in this situation.