In a comment on our recent discussion of stock and flow, Tom Fiddaman writes:

Here’s an egregious example of statistical stock-flow confusion that got published.

Fiddaman is pointing to a post of his from 2011 discussing a paper that “examines the relationship between CO2 concentration and flooding in the US, and finds no significant impact.”

Here’s the title and abstract of the paper in question:

Has the magnitude of floods across the USA changed with global CO2 levels?

R. M. Hirsch & K. R. Ryberg

Abstract

Statistical relationships between annual floods at 200 long-term (85–127 years of record) streamgauges in the coterminous United States and the global mean carbon dioxide concentration (GMCO2) record are explored. The streamgauge locations are limited to those with little or no regulation or urban development. The coterminous US is divided into four large regions and stationary bootstrapping is used to evaluate if the patterns of these statistical associations are significantly different from what would be expected under the null hypothesis that flood magnitudes are independent of GMCO2. In none of the four regions defined in this study is there strong statistical evidence for flood magnitudes increasing with increasing GMCO2. One region, the southwest, showed a statistically significant negative relationship between GMCO2 and flood magnitudes. The statistical methods applied compensate both for the inter-site correlation of flood magnitudes and the shorter-term (up to a few decades) serial correlation of floods.

And here’s Fiddaman’s takedown:

There are several serious problems here.

First, it ignores bathtub dynamics. The authors describe causality from CO2 -> energy balance -> temperature & precipitation -> flooding. But they regress:

ln(peak streamflow) = beta0 + beta1 × global mean CO2 + error

That alone is a fatal gaffe, because temperature and precipitation depend on the integration of the global energy balance. Integration renders simple pattern matching of cause and effect invalid. For example, if A influences B, with B as the integral of A, and A grows linearly with time, B will grow quadratically with time.

This sort of thing comes up a lot in political science, where the right thing to do is not so clear. For example, suppose we’re comparing economic outcomes under Democratic and Republican presidents. The standard thing to look at is economic growth. But maybe it is changes in growth that should matter? As Jim Campbell points out, if you run a regression using economic growth as an outcome, you’re implicitly assuming that these effects on growth persist indefinitely, and that’s a strong assumption.

Anyway, back to Fiddaman’s critique of that climate-change regression:

The situation is actually worse than that for climate, because the system is not first order; you need at least a second-order model to do a decent job of approximating the global dynamics, and much higher order models to even think about simulating regional effects. At the very least, the authors might have explored the usual approach of taking first differences to undo the integration, though it seems likely that the data are too noisy for this to reveal much.

Second, it ignores a lot of other influences. The global energy balance, temperature and precipitation are influenced by a lot of natural and anthropogenic forcings in addition to CO2. Aerosols are particularly problematic since they offset the warming effect of CO2 and influence cloud formation directly. Since data for total GHG loads (CO2eq), total forcing and temperature, which are more proximate in the causal chain to precipitation, are readily available, using CO2 alone seems like willful ignorance. The authors also discuss issues “downstream” in the causal chain, with difficult-to-assess changes due to human disturbance of watersheds; while these seem plausible (not my area), they are not a good argument for the use of CO2. The authors also test other factors by including oscillatory climate indices, the AMO, PDO and ENSO, but these don’t address the problem either. . . .

I’ll skip a bit, but there’s one more point I wanted to pick up on:

Fourth, the treatment of nonlinearity and distributions is a bit fishy. The relationship between CO2 and forcing is logarithmic, which is captured in the regression equation, but I’m surprised that there aren’t other important nonlinearities or nonnormalities. Isn’t flooding heavy-tailed, for example? I’d like to see just a bit more physics in the model to handle such issues.

If there’s a monotonic pattern, it should show up even if the functional form is wrong. But in this case Fiddaman has a point, in that the paper he’s criticizing makes a big deal about *not* finding a pattern, in which case, yes, using a less efficient model could be a problem.

Similarly with this point:

Fifth, I question the approach of estimating each watershed individually, then examining the distribution of results. The signal to noise ratio on any individual watershed is probably pretty horrible, so one ought to be able to do a lot better with some spatial pooling of the betas (which would also help with issue three above).

Fiddaman concludes:

I think that it’s actually interesting to hold your nose and use linear regression as a simple screening tool, in spite of violated assumptions. If a relationship is strong, you may still find it. If you don’t find it, that may not tell you much, other than that you need better methods. The authors seem to hold to this philosophy in the conclusion, though it doesn’t come across that way in the abstract.