Yu Xie thought I’d have something to say about this recent paper, “Evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River policy,” by Yuyu Chen, Avraham Ebenstein, Michael Greenstone, and Hongbin Li, which begins:
This paper’s findings suggest that an arbitrary Chinese policy that greatly increases total suspended particulates (TSPs) air pollution is causing the 500 million residents of Northern China to lose more than 2.5 billion life years of life expectancy. The quasi-experimental empirical approach is based on China’s Huai River policy, which provided free winter heating via the provision of coal for boilers in cities north of the Huai River but denied heat to the south. Using a regression discontinuity design based on distance from the Huai River, we find that ambient concentrations of TSPs are about 184 μg/m3 [95% confidence interval (CI): 61, 307] or 55% higher in the north. Further, the results indicate that life expectancies are about 5.5 y (95% CI: 0.8, 10.2) lower in the north owing to an increased incidence of cardiorespiratory mortality.
Before going on, let me just say that, if you buy this result, you should still be interested in it even if the 95% confidence intervals had happened to include zero. There is an unfortunate convention that “p less than .05” results are publishable while “non-significant” results are not. The life expectancy of 500 million people is important, and I’d say it’s inappropriate to wait on statistical significance to make that judgment.
Getting to the details, though, I’d have to say that I’m far less than 97.5% sure that the effects are in the direction that the authors claim (recall that 97.5% is the posterior probability associated with p=.05 under a flat prior). And, for the usual proper-prior Bayesian reasons, I’d guess that this “2.5 billion years of life expectancy” is an overestimate.
Here’s the key figure from the paper:
This is a beautiful graph. I love love love a plot that shows the model and the data together. One thing I like about this particular graph is that, just looking at it, you can see how odd the model is. Or, at least, how odd it looks to an outsider. A third-degree polynomial indeed! It looks like that’s where the claim of 5 years of life expectancy came from. I’m a little confused still, because the interval is [1.3, 8.1] in the graph and [0.8, 10.2] in the abstract, so there must be something else going on, but this seems like the basic story.
Table S.9 in the supplemental material gives their results trying other models. The cubic adjustment gave an estimated effect of 5.5 years with standard error 2.4. A linear adjustment gave an estimate of 1.6 years with standard error 1.7. I did some cropping to show you the relevant part of the table:
My point here is not the the linear model is correct—the authors in fact supply data-based reasons for preferring the cubic—but rather that the headline claim, and its statistical significance, is highly dependent on a model choice that has no particular scientific (as distinguished from data-analytic) basis. Figure 3 above indicates to me that neither the linear nor the cubic nor any other polynomial model is appropriate here; that there are other variables not included in the model that distinguish the circles in the graph. A multilevel model might be a good idea; of course that would increase standard errors in its own way. (Or one could try some less model-based approach such as the robust regression discontinuity method of Calonico, Cattaneo, and Titiunik. I prefer models because, for me, the model-building step meshes with the goal of increasing substantive understanding. But it’s your call. In any case, I’d like to ditch that approach of estimating high-degree polynomials.)
To get back to the main conclusions: There might well be good reasons for expecting an effect of even more than 5 years of life expectancy from this policy—I don’t know anything about this topic—but, from the above data alone, the claim of 5 years looks artifactual. And, there also seems to be an implication that, in those northern areas with life expectancy around 80, that people would be living to 85 in the absence of the policy. Maybe. But it seems like a strong conclusion to make, if it’s being driven by this data analysis alone. Which is what they seem to be doing, in that they’re just taking their estimated regression coefficient and considering it as a treatment effect.
Let me emphasize that I’m not not not saying that particulate matter doesn’t kill, or that this topic shouldn’t be studied, or that these findings shouldn’t be published in a high-profile journal. The accompanying article by C. Arden Pope and Douglas Dockery gives lots of background on why Chen et al.’s conclusions are scientifically plausible.
What I am suggesting is a two-step: the authors retreat from their strongly model-based claim of statistical significance, and the journal accept that non-statistically-significant findings on important topics are still worth publishing.
P.S. This study was featured last month in a New York Times article by Edward Wong. The report was uncritical, referring to “the 5.5-year drop in life expectancy in the north.” Again, I’m not saying the paper’s conclusions are wrong, just that I don’t think they’re supported by the data as unequivocally as one might think from the published confidence intervals.
P.P.S. Let me say all this again because I’ve been posting some negative things lately and my goal is to be constructive, not negative:
Respiration is important. Increasing lifespan is important. There’s lots of evidence that air pollution is bad for you. Policies matter. Environmental policies and environmental outcomes can and should be studied by quantitative researchers. Data are never perfect but we still need to move forward. Subject-matter researchers spend decades of their life establishing subject-matter expertise, and they don’t always have full statistical expertise. That is fine. There is a division of labor. I am a statistics expert and don’t have much subject-matter knowledge about environmental science, even though I publish papers in the area. Regression discontinuity is a great idea. Causal inference from observational data is difficult but still needs to be done. There’s often no easy way to control for background variables in a quasi-experiment. These researchers did the best they could. Their conclusions are consistent with much of the literature and their paper was accepted in a leading scientific journal. Even if their work is imperfect it should not be dismissed. I focus on a statistical concern with this paper because statistics is what I do. I suspect that an improved analysis of these data would yield a higher level of uncertainty, perhaps leading to 95% intervals that contain zero. This would not mean that the true effect is zero, it just means there is some level of uncertainty in the effect given an analysis based only on the data at hand. I am skeptical about an effect of 5 years of life but I could be wrong. I think it would be fine for the journal to publish an article just like this one, but without the 3rd-degree polynomial and with a smaller and non-statistically-significant estimate of the effect. I have the impression that this and other journals have an implicit rule that under normal circumstances they will publish this sort of statistical paper only if it has statistically significant results. That’s a regression discontinuity right there, and researchers in various fields have found evidence that it introduces some endogeneity in the selection variable.
The above post is not an attempt to shoot down the article by Chen et al. I did not read the paper in detail. I’m giving my impressions, and I’m using this example to highlight some general issues that arise with causal inference from observational data, especially in discontinuity designs. (Here’s an earlier example, this time with a fifth-degree polynomial, and on a lighter topic.) I also wouldn’t mind stirring up some post-publication peer review here, and maybe some subject-matter experts can contribute usefully. Again, this is an important topic. Lives are at stake, it’s worth capturing our scientific uncertainty as well as we can here, even if it means that I critique a published paper and possibly embarrass myself in the process.
Get it? Got it. Good.
P.P.P.S. More discussion here.