Skip to content
 

China air pollution regression discontinuity update

Avery writes:

There is a follow up paper for the paper “Evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River policy” [by Yuyu Chen, Avraham Ebenstein, Michael Greenstone, and Hongbin Li] which you have posted on a couple times and used in lectures. It seems that there aren’t much changes other than newer and better data and some alternative methods. Just curious what you think about it.

The paper is called, “New evidence on the impact of sustained exposure to air pollution on life expectancy from China’s Huai River Policy” [by Avraham Ebenstein, Maoyong Fan, Michael Greenstone, Guojun He, and Maigeng Zhou].

The cleanest summary of my problems with that earlier paper is this article, “Evidence on the deleterious impact of sustained use of polynomial regression on causal inference,” written with Adam Zelizer.

Here’s the key graph, which we copied from the earlier Chen et al. paper:

The most obvious problem revealed by this graph is that the estimated effect at the discontinuity is entirely the result of the weird curving polynomial regression, which in turn is being driven by points on the edge of the dataset. Looking carefully at the numbers, we see another problem which is that life expectancy is supposed to be 91 in one of these places (check out that green circle on the upper right of the plot)—and, according to the fitted model, the life expectancy there would be 5 years higher, that is, 96 years!, if only they hadn’t been exposed to all that pollution.

As Zelizer and I discuss in our paper, and I’ve discussed elsewhere, this is a real problem, not at all resolved by (a) regression discontinuity being an identification strategy, (b) high-degree polynomials being recommended in some of the econometrics literature, and (c) the result being statistically significant at the 5% level.

Indeed, items (a), (b), (c) above represent a problem, in that they gave the authors of that original paper, and the journal reviewers and editors, a false sense of security which allowed them to ignore the evident problems in their data and fitted model.

We’ve talked a bit recently about “scientism,” defined as “excessive belief in the power of scientific knowledge and techniques.” In this case, certain conventional statistical techniques for causal inference and estimation of uncertainty have led people to turn off their critical thinking.

That said, I’m not saying, nor have I ever said, that the substantive claims of Chen et al. are wrong. It could be that this policy really did reduce life expectancy by 5 years. All I’m saying is that their data don’t really support that claim. (Just look at the above scatterplot and ignore the curvy line that goes through it.)

OK, what about this new paper? Here’s the new graph:

You can make of this what you will. One thing that’s changed is that the two places with life expectancy greater than 85 have disappeared. So that seems like progress. I wonder what happened? I did not read through every bit of the paper—maybe it’s explained there somewhere?

Anyway, I still don’t buy their claims. Or, I should say, I don’t buy their statistical claim that their data strongly support their scientific claim. To flip it around, though, if the public-health experts find the scientific claim plausible, then I’d say, sure, the data are consistent with the this claimed effect on life expectancy. I just don’t see distance north or south of the river as a key predictor, hence I have no particular reason to believe that the data pattern shown in the above figure would’ve appeared, without the discontinuity, had the treatment not been applied.

I feel like kind of a grinch saying this. After all, air pollution is a big problem, and these researchers have clearly done a lot of work with robustness studies etc. to back up their claims. All I can say is: (1) Yes, air pollution is a big problem so we want to get these things right, and (2) Even without the near-certainty implied by these 95% intervals excluding zero, decisions can and will be made. Scientists and policymakers can use their best judgment, and I think they should do this without overrating the strength of any particular piece of evidence. And I do think this new paper is an improvement on the earlier one.

P.S. If you want to see some old-school ridiculous regression discontinuity analysis, check this out:

A global fourth-degree polynomial, huh? This is almost a parody of how to make spurious discoveries. As always, try looking at the graph without the helpful curvy lines and see if you can pick out any pattern at all. Disappointing to see this in the AJPS. The person who sent it to me writes:

The paper uses an RD to estimate whether national MPs in India are more likely to allocate pork to districts represented by copartisans at the state level than by out-partisans . . . Table 1 seems to be the main empirical takeaway:

1) Uses a 4th degree polynomial in the main RD.

2) Post-treatment covariates. How can you control for project-specific covariates when the outcome of interest is the degree to which projects were funded? Presumably the profile of projects changed, too.

3) Author admits that the 2 years pre and post election are somewhat arbitrary.

4) Regression to the mean? States with copartisan representation receive 59,000 rupees more post-election than states with out-partisan representation, but states with copartisan representation received 52,000 rupees LESS pre election. So national MPs punish their co-partisans before competitive elections, but reward them after? Huh.

As usual, my problem is not that researchers use questionable statistical methods, or that they go out on a limb, making claims beyond what can be supported by their data. Those of us who develop advanced statistical methods have to be aware that, once a method is out there, it can be used “off-label” by anybody, and lots of those uses will be mistaken in some way or another. No, my problem is the false sense of certainty that appears to be engendered by the use of high-tech statistics. If fourth-degree polynomials had never been invented, researchers could look at the above graph with just the scatterplot and draw their own conclusions, with no p-values to mislead them. The trouble is that, for many purposes, we do need advanced methods—looking at scatterplots and time series is not always enough—hence we need to get into the details and explain why certain methods such as regression discontinuity with high-degree polynomials don’t work as advertised; see here and here.

21 Comments

  1. Mark Thompson says:

    Andrew:

    I’m curious to get your thoughts on robust RD (Calonico, Cattaneo, and Titiunik 2015), given that the method often fits data using high-degree polynomials. I increasingly see papers using this approach as a way of minimizing researcher discretion over the number of polynomial terms and bins.

    • Hi Mark:

      Thanks for your question. Andy invited us to answer it, since it refers directly to our work.

      We view our paper Calonico, Cattaneo and Titiunik (2015, JASA) as providing data-driven, principled methods for graphical visualization of RD designs, and for conducting some heuristic specification tests. However, we recommend against using this methodology for estimation and inference of RD treatment effects. The RD plot is a tool to visualize and illustrate, not to formally estimate effects or make statistical inferences. We state this in our paper (p. 1756-1757): “Global polynomial approximations may not perform well in RD applications and, more generally, in approximating regression functions locally. These polynomial approximations for regression functions tend to (i) generate counterintuitive weighting schemes (Gelman and Imbens 2014), (ii) have erratic behavior near the boundaries of the support (usually known as the Runge’s phenomenon in approximation theory), and (iii) oversmooth (by construction) potential discontinuities in the interior of the support.” Point (ii) is the key when it comes to RD estimation and inference.

      We have made this same point multiple times in our work. For example, take a look at our forthcoming Cambridge monograph: http://www-personal.umich.edu/~cattaneo/books/Cattaneo-Idrobo-Titiunik_2018_CUP-Vol1.pdf , where in Section 4.1 (Local Polynomial Approach: Overview) we write “Since the RD point estimator is defined at a boundary point, global polynomial methods can lead to unreliable RD point estimators, and thus the conclusions from a global parametric RD analysis can be highly misleading”. Instead, we advocate for local to the cutoff analysis when estimation and inference of RD treatment effect is the main goal. See our other papers here: https://sites.google.com/site/rdpackages/rdrobust/

      Best wishes,

      Matias

  2. Martin Lindfors says:

    Regarding the last plot: Off-label methods indeed.

    “Figure 1 utilizes the
    optimal data-driven RD plots developed by Calonico,
    Cattaneo, and Titiunik (2015) to allow for a correspond-
    ing visual examination of the discontinuity at the cut
    point. Consistent with the results in Table 1, column 2,
    Figure 1 shows visual evidence of a clear discontinuity at
    the cut point for projects proposed in the 2-year period
    after a state election”

  3. Dean Eckles says:

    Isn’t one of the key recommended robustness checks to get results with double and half the selected bandwidth? Or at least plotting how the results change as a function of bandwidth.

    The authors claim their results are robust to other bandwidths and kernels, but I don’t see any results (in the paper or SI) with the same (triangular kernel) and substantially varying bandwidth.

  4. Near the boundary of an interval, all the information about the behavior of the function comes from *one side* of the interval. Giving the function the opportunity to wiggle around in the presence of random noise near the end of the interval virtually ensures that unregularized fits will curve near the ends. The Co-Partisanship graph gives a perfect example. If you try hard to ignore the curve, you’ll see that a perfectly flat line seems to be a very reasonable model, but in the presence of degrees of freedom for wiggle, near the ends each polynomial wiggles…

    Here is R code to generate 10 sixth order polynomial regressions to data that is just normal(0,1) errors (so the correct regression line is y(x)=0)

    library(ggplot2)
    set.seed(1)

    dataset=list()

    pdf(“test.pdf”)
    xes=seq(0,1,.01)
    for (i in 1:10){
    dataset[[i]] = data.frame(x=xes,y=rnorm(length(xes),0,1))
    print(ggplot(data=dataset[[i]],aes(x,y))+geom_point()+geom_smooth(method=lm,formula=y~poly(x,6)))

    }

    dev.off()
    system(“evince test.pdf&”)

    Run it, and you’ll see that basically every single page of the 10 page pdf has a wiggly regression line that curves strongly at the edges.

  5. am I the only one who experiences that the blog refuses to remember who I am and I have to enter my contact info every time or I post accidentally as anonymous? Anyway…

    Since the data near the edge of an interval comes from only one side of the interval, it’s virtually guaranteed to be the case that a polynomial will wiggle near the edges. Here is code to generate 10 plots of a 6th degree polynomial fit to normal(0,1) random noise whose correct regression line is y=0, see for yourself:

    library(ggplot2)
    set.seed(1)

    dataset=list()

    pdf(“test.pdf”)
    xes=seq(0,1,.01)
    for (i in 1:10){
    dataset[[i]] = data.frame(x=xes,y=rnorm(length(xes),0,1))
    print(ggplot(data=dataset[[i]],aes(x,y))+geom_point()+geom_smooth(method=lm,formula=y~poly(x,6)))

    }

    dev.off()
    system(“evince test.pdf&”)

  6. Dzhaughn says:

    Belief in the discontinuity is pinned to a belief in a substantial positive slope from -5 to +5 degrees latitude. Once we change air pollution policy, we would be similarly obliged to try policies that nudge populations to the healthy middle-north latitudes.

    I think that is a third reason to be skeptical: one has to look at the totality of implications of an analysis, not just the implications for a favored topic.

    • Terry says:

      Good point about the positive slope from -5 to +5 degrees latitude.

      If we take this paper seriously, the positive-slope results are far more important than the Huai River discontinuity. Life expectancy could be increased by about 10 years by just moving north — an effect probably bigger than curing both cancer and heart disease.

      Such results would be astounding if they are widely applicable.

  7. Terry says:

    I’m having a hard time believing that the bubble graphs got published in a journal that anyone reads. Not just once, but twice! The conclusions seem preposterously weak.

    Are these papers severe outliers in this field?
    Is this a journal no one takes seriously?
    Or, is this a big joke made up by Andrew or his correspondent?

    Do journals in this field signal in some way that some papers they publish are pretty crappy? Do they make them the bottom article in the issue?

    • Andrew says:

      Terry:

      I think it’s a sort of ideology, or overconfidence, that various scientists are trained to think that “identification strategies” such as regression discontinuity analysis will give them the answer. And they get a lot of feedback supporting this attitude. Remember, that original air-pollution-in-China paper was published in a top journal and received wide and uncritical press attention. So, lots of reasons for people to think they’re on the right track when they’re doing this sort of thing, even though from a scientific position it’s ridiculous.

    • jrc says:

      “Is this a journal no one takes seriously?”

      Well – do you take the National Academy of Sciences seriously? I think Andrew has flip-flopped on that, they aren’t Prestigious Proceedings anymore, just Proceedings, which, in internet, means that they can be taken seriously again. Or something.

      I suspect this would not have gotten published in the top Economics journals, at least not twice (!). My guess is they got a second PNAS out of it because of Andrew’s critique – they do the non-parametric local regressions Gelman/Imbens suggest, and they do placebo tests at other distance cutoffs. Which all ties into Andrew’s point, which is that once people focus primarily on the “identification strategy”, they stop thinking carefully about the world. I can almost see reviewers thinking “well, I’m not really all that convinced, and there are some weird differences here relative to the traditional idea of a running-variable in an RD, but I don’t see anything obviously wrong with the methods, and I can’t say the question isn’t important/interesting, so… ¿accept?” It could also just be an editor who wanted to give them a chance to respond, whether you think they deserved PNAS space for that or not.

      If I’ve been surprised by anything I’ve learned about publishing since I became a paid (instead of paying) person in Academia, its that the top general interest journals (PNAS, Nature, Science) publish a whole lot of really bad papers that wouldn’t get into the top journals in their respective disciplines. That might be less true in other fields – I’m sure I’d publish my cure for cancer in Science – but in Econ, people actually value a Science less than an American Economic Review or an Econometrica (its true – you couldn’t make that s*** up). That usually makes us look dumb to researchers in other fields, but then you see this kinda stuff and you’re like “oh yeah, right, they have really poor taste and discernment in social science research.”

      • Terry says:

        I’ve never seen an economist publish anything of interest in Science.

        Does Science even have economist editors? Who would they get to review it? Why publish it in Science when economists don’t read Science?

        • Andrew says:

          Terry:

          If you publish a paper in one of the tabloids (Science, Nature, PNAS), I think you’re much more likely to get lots of media attention. It seems that, for the news media, a paper in one of the tabloids is more important than anything published in the American Economic Review, American Political Science Review, etc. A complicating factor is that the tabloids encourage bold claims and simple statements with minimal qualifications.

          So . . . if you’re a social scientist and you think your work is important, it makes sense to publish in the tabloids, if you can, so your work will get more attention. And if you’re the sort of social scientist who’s willing to make bold claims far beyond what is warranted by your data, you just might be able to write the sort of article that the tabloids will want to publish.

          In any case, it’s not just the tabloids. The article featured in the P.S. in the above post appeared in the American Journal of Political Science.

  8. It’s worse than you think, Andrew. The wonderful, magical city about 9 degrees north of the river with a life expectancy of over 90 has since disappeared! The spot is hidden by a text block in the revised graph. It would be wrong to hide data, so therefore there must not be a data point there anymore, from which one can only conclude that the town itself has vanished, or many of its people have been killed (lowering life expectancy), undoubtedly by the onslaught of north-of-river air pollution.

    One could perhaps alternatively conclude that the life expectancy value for this city was revised, but that would require acknowledging that datapoints have measurement error, which would lead the authors to realize that that’s yet another thing to consider when fitting models to data, after which the authors would stop themselves from publishing junk. But clearly, that’s not the case.

    Q.E.D.

  9. Paul says:

    Ignoring the changes to the data, I think the weakness of this analysis is how it’s driven by a few points near the boundary. It’s basically saying “life expectancy in the 4 southern cities closest to the river is a bit lower than life expectancy in the 4 northern cities closest to the river”. If either of those two sets of points were shifted a bit, the discontinuity would vanish. If your result depends on a segment of 4 data points you could ask about latent heterogeneity and cryptic correlations which, if accounted for, would increase the standard error and reduce the significance of this result.

    All that said, I actually think this new paper isn’t entirely terrible as an exploratory analysis. Definitely could be wrong but not totally unworthy of publication, if appropriately caveatted (which of course it never is).

Leave a Reply