Jeremy Fox writes:
You’ve probably seen this [by Matthew Hankins]. . . . Everyone else on Twitter already has. It’s a graph of the frequency with which the phrase “marginally significant” occurs in association with different P values. Apparently it’s real data, from a Google Scholar search, though I haven’t tried to replicate the search myself.
My reply: I admire the effort that went into the data collection and the excellent display (following Bill Cleveland etc., I’d prefer a landscape rather than portrait orientation of the graph, also I’d prefer a gritty histogram rather than a smooth density, and I don’t like the y-axis going below zero, nor do I like the box around the graph, also there’s that weird R default where the axis labels are so far from the actual axes, I don’t know whassup with that . . . but these are all minor, minor issues, certainly I’ve done much worse myself many times even in published articles; see the presentation here for lots of examples), and I used to love this sort of thing, but in my grumpy middle age I’m sort of tired of it.
Let me clarify. I have no problem with Hankins’s graph. It’s excellent. But here’s the thing. In the old days I’d see such a graph (for example, various histograms of published z-scores showing peaks around 2.0 and troughs at 1.95) and think about how researchers can select what results to publish and how they can play around a bit and go from p=0.06 to p=0.04. But after what’s come out in the past few years (most notably, the article by Simmons, Nelson, and Simonsohn; see here for a popular summary of the concerns or here for a bunch of my recent papers on the topic), now I feel the problem is much more serious. A serious researcher can easily get statistical significance when nothing is going on at all (or, of course, when something is going on but where the population comparison of interest is of indeterminate magnitude and direction). And this can happen without the researcher even trying, just from doing an analysis that seems reasonable for the data at hand.
So, given all this, the focus on p=.04 or .06 or .10 seems to be beside the point. It’s worth looking at—again, I applaud what Hankins did—but what it’s focusing on is the least of our problems.