Thanks for bringing up the most interesting piece by Gerber and Malhotra and the Drum comment.
My own take is perhaps a bit less sinister but more worrisome than Drum’s interpretation of the results. The issue is how “tweaking” is interpreted. Imagine a preliminary analysis which shows a key variable to have a standard error as large as its coefficient (in a regression). Many people would simply stop analysis at that point. Now consider getting a coefficient one and a half times its standard error (or 1.6 times its standard error). We all know it is not hard at that point to try a few different specifications and find one that gives a magic p-value just under .05 and hence earning the magic star. But of course the magic star seems critical for publication.
Thus I think the problem is with journal editors and reviewers who love that magic star. And hence to authors who think that it matters whether t is 1.64 or 1.65. Journal editors could (and should) correct this.
When Political Analysis went quarterly we got it about a third right. Our instructions are:
“In most cases, the uncertainty of numerical estimates is better conveyed by confidence intervals or standard errors (or complete likelihood functions or posterior distributions), rather than by hypothesis tests and p-values. However, for those authors who wish to report “statistical significance,” statistics with probability levels of less than .001, .01, and .05 may be flagged with 3, 2, and 1 asterisks, respectively, with notes that they are significant at the given levels. Exact probability values may always be given. Political Analysis follows the conventional usage that the unmodified term “significant” implies statistical significance at the 5% level. Authors should not depart from this convention without good reason and without clearly indicating to readers the departure from convention.”
Would that I had had the guts to drop “In most cases” and stop after the first sentence. And even better would have been to simply demand a confidence interval.
Most (of the few) people I talk with have no difficulty distinguishing “insignificant” from “equals zero,” but Jeff Gill in his “The Insignificance of Null Hypothesis Significance Testing” (Political Research Quarterly, 1999) has a lot of examples showing I do not talk with a random sample of political scientists. Has the world improved since 1999?
BTW, since you know my obsession with what Bayes can or cannot do to improve life, this whole issue, is in my mind, the big win for Bayesians. Anything that lets people not get excited or depressed depending on whether a CI (er HPD credible region) is (-.01,1.99) or (.01,2.01) has to be good.
My take on this: I basically agree. In many fields, you need that statistical significance–even if you have to try lots of tests to find it.