A commented pointed out this note by Kevin Drum on this cool paper by Alan Gerber and Neil Malhotra on p-values in published political science papers. They find that there are suprisingly many papers with results that are just barely statistically significant (t=1.96 to 2.06) and surprisingly few that are just barely not significant (t=1.85 to 1.95). Perhaps people are fuding their results or selecting analyses to get significance. Gerber and Malhotra’s analysis is excellent–clean and thorough.
Just one note: the finding is interesting, and I love the graphs, but, as Gerber and Malhotra note,
We only examined papers that listed a set of hypotheses prior to presenting the statistical results. . . .
I think it’s kind of tacky to state a formal “hypothesis,” especially in a social science paper, partly because, in many (most?) of my research, the most interesting finding was not anything we’d hypothesized ahead of time. (See here for some favorite examples.) I think there’s a problem with the whole mode of research that focuses on “rejecting hypotheses” using statistical significance, and so I’m sort of happy to find that Gerber and Malhotra notice a problem with studies formulated in this way.
In practice, t-statistics are rarely much more than 2. Why? Because, if they’re much more than 2, you’ll probably subdivide the data (e.g., look at effects among men and among women) until subsample sizes are too small to learn much. Knowing this can affect experimental design, as I discuss in my paper, “Should we take measurements at an intermediate design point?”