I got this email from a journalist:
This seems . . . irresponsible to me.
For the first 100 years that meteorologists kept weather records at Central Park, from 1869 through 1996, they recorded just two snowstorms that dumped 20 inches or more. But since 1996, counting this week’s storm, there have been six. (You’ll find similar stats for other major East Coast cities.)
Basically, we’ve become accustomed to something that used to be very rare.
The link points to a post by Eric Holthaus on grist.org, and I agree with the person who sent this to me, that the argument is pretty bad, at least as presented.
First, let’s compute the simple statistical comparison. The previous rate was 2 out of 128, and the new rate is 6 out of 22. So:
y <- c(2, 6) n <- c(128, 22) p_hat <- y/n diff <- p_hat - p_hat se_diff <- sqrt(sum(p_hat*(1-p_hat)/n))
The difference between the two probabilities is 0.26 and the standard error is 0.10. So, sure, it's more than 2 standard errors from zero: good enough for grist.com, PPNAS, NPR, and your friendly neighborhood Ted talk.
But not good enough for the rest of us. The researcher degrees of freedom are obvious: the choice of 1996 as a cutpoint, the choice of 20 inches as a cutpoint, and the decision to pick out snowstorms as the outcome of interest. Also there can be changes at various time scales, so it's not quite right to treat each year as an independent data point. In summary, just by chance alone, we'd expect to be able to see lots of apparently statistically significant patterns by sifting through weather data in this way.
The story here is the usual: To the extent that this evidence is presented in support of a clear theory, it could be meaningful. By itself, it's noise.