Blake McShane and David Gal recently wrote two articles (“Blinding us to the obvious? The effect of statistical training on the evaluation of evidence” and “Statistical significance and the dichotomization of evidence”) on the misunderstandings of p-values that are common even among supposed experts in statistics and applied social research.
The key misconception has nothing to do with tail-area probabilities or likelihoods or anything technical at all, but rather with the use of significance testing to finesse real uncertainty.
As John Carlin and I write in our discussion of McShane and Gal’s second paper (to appear in the Journal of the American Statistical Association):
Even authors of published articles in a top statistics journal are often confused about the meaning of p-values, especially by treating 0.05, or the range 0.05–0.15, as the location of a threshold. The underlying problem seems to be deterministic thinking. To put it another way, applied researchers and also statisticians are in the habit of demanding more certainty than their data can legitimately supply. The problem is not just that 0.05 is an arbitrary convention; rather, even a seemingly wide range of p-values such as 0.01–0.10 cannot serve to classify evidence in the desired way.
In our article, John and I discuss some natural solutions that won’t, on their own, work:
– Listen to the statisticians, or clarity in exposition
– Confidence intervals instead of hypothesis tests
– Bayesian interpretation of one-sided p-values
– Focusing on “practical significance” instead of “statistical significance”
– Bayes factors
You can read our article for the reasons why we think the above proposed solutions won’t work.
From our summary:
We recommend saying No to binary conclusions . . . resist giving clean answers when that is not warranted by the data. . . . It will be difficult to resolve the many problems with p-values and “statistical significance” without addressing the mistaken goal of certainty which such methods have been used to pursue.
P.S. Along similar lines, Stephen Jenkins sends along the similarly-themed article, “‘Sing Me a Song with Social Significance’: The (Mis)Use of Statistical Significance Testing in European Sociological Research,” by Fabrizio Bernardi, Lela Chakhaia, and Liliya Leopold.