Following up on my regression-discontinuity post from the other day, Brad DeLong writes:
The feel (and I could well be wrong) as that at some point somebody said: “This is very important, but it won’t get published without a statistically significant headline finding. Torture the data via specification search until we find a statistically significant effect so that this can get published!”
I think DeLong is mistaken here. But, before getting to this, here’s the graph:
and here are the regression results:
So, indeed it is that cubic term that takes the result into statistical significance.
The reason I disagree with DeLong is that it’s my impression that, in econometrics and applied economics, it’s considered the safe, conservative choice in regression discontinuity to control for a high-degree polynomial. See the paper discussed a few years ago here, for example, where I criticized a pair of economists for using a fifth-degree specification and they replied, “the regression discontinuity methods we use in the paper (including the 5th degree polynomial) are standard in economics (see for example the 2009 working paper on R&D implementation by David Lee and Thomas Lemiuex).”
As we’ve discussed on this blog at other times, many methodologists (especially, but certainly not only, in economics) have a naive belief that they should be using unbiased estimates (not recognizing that, in practice, unbiasedness at one point the analysis is achieved at the expense of averaging over some other dimension such as time), and it would seem that, the higher degree the polynomial correction, the lower the bias. In which case the four additional degrees of freedom required to ramp up from a linear to a 5th-degree adjustment are a small price to pay if you have a large or even moderate sample size.
And in this case, sure, the cubic polynomial looks ridiculous, but a linear fit would be even worse (as the authors found using their model-fit statistics). I’m guessing that the authors were doing what they thought was right and proper by choosing the best-fitting of these polynomials.
What if the result had been statistically significant with linear adjustment but not with a higher-degree polynomial? What would the authors have done? Would they have presented the statistically significant linear result and stopped there? I have no idea. But, given my impression of how economists think about regression discontinuity analysis, my guess is that, given the data the authors did see, that they did not do a specification search; they just did what they thought was the most kosher analysis possible.
Why this is important
If Chen et al. had violated the rules of the game (in this case, not by faking or improperly discarding data but by trying analysis after analysis in a search for statistical significance), this would be a problem, but it’s a containable problem. The rules are (relatively clear), and you’re not supposed to break them.
But I think the problem is worse than that. I think Chen et al. did what, under current doctrine, they were supposed to do: find a discontinuity and adjust using a high-degree polynomial. When the recommended analysis has such problems of face validity, that’s a different problem entirely.
As the (sometimes) great Michael Kinsley once said, in a different context, “the scandal isn’t what’s illegal, the scandal is what’s legal.”
P.S. Just to clarify: Not only do I not think that Chen et al.
“cheated” (in the sense of trying out many specifications in a search for statistical significance), I never thought so. As I wrote in my original post, I applaud the authors’ directness in graphing their model which reveals its problems. My post title, “I doubt they cheated,” is specifically in response to Brad DeLong’s feeling that they “tortured the data via specification search.” For the reasons described above, I don’t think they did.