Skip to content

Choices in how to write about regression results (example of an analysis of U.S. military aid in Colombia)

John reports on an article by Oeindrila Dube and Suresh Naidu, who ran some regressions on observational data and wrote:

This paper examines the effect of U.S. military aid on political violence and democracy in Colombia. We take advantage of the fact that U.S. military aid is channeled to Colombian army brigades operating out of military bases, and compare how changes in aid affect outcomes in municipalities with and without bases. Using detailed data on violence perpetuated by illegal armed groups, we …find that U.S. military aid leads to differential increases in attacks by paramilitaries . . .

It’s an interesting analysis, but I wish they’d restrained themselves and replaced all their causal language with “is associated with” and the like.

From a statistical point of view, what Dubey and Naiduz are doing is estimating the effects of military aid in two ways: first, by comparing outcomes in years in which the U.S. spends more or less in military aid; second, by comparing outcomes in cities in Colombia with and without military bases.

One of the challenges in interpreting such results is that both these patterns are being summarized at once when you fit a single time-series cross-sectional regression. From the standpoint of statistical efficiency, this is a plus, but it also makes the results more difficult to interpret, and also leads to technical challenges.

In this case, the researchers appear to have found that found that, when looking at the differences in outcomes in any given year comparing cities with and without military bases, these differences were larger, on average, in years where there was more U.S. military aid (either in Colombia or the rest of the world). What I’d really like to see are some scatterplots that make this pattern clear. The only graph I see that’s directly relevant to their analysis is Figure 2, which reveals U.S. military spending has been higher in periods where there have been relatively more paramilitary attacks in cities with military bases–but without seeing the actual numbers, it’s hard for me to interpret this, or to really see the evidence that increases in military aid are causing the attacks.

I mean, sure, it makes sense–more $ for the military, thus more resources for military-affiliated organizations–and I also agree that the result is not a priori obvious (you could imagine a world in which giving military more resources would allow them to shift away from paramilitaries)–but, from this graph, it really looks like what’s happening, from a statistical point of view, is that you’re comparing the last third of the time series to the first two-thirds. Maybe cleaner, from an “identification strategy” standpoint, to just call this a “natural experiment” and leave it at that. But then you have to worry that you’re just seeing a time association, with more U.S. military aid coming at a time of more conflict.

P.S. Chris Blattman writes:

With town-by-town variation in military aid, Dube and Naidu can look how annual changes affect local violence and politics. . . . Their intuition: military aid indirectly helps paramilitary groups carry out political attacks and intimidate voters.

From my reading of the article, I ddn’t see any information on town-by-town variation in military aid; I think the town-by-town variation was just on whether there was a military base. I could definitely have missed something, though, and in any case, even if the researchers had used such variation, I probably would’ve made the same cautions about their causal interpretation. I’d prefer to replace Chris’s “indirectly helps” with “is associated with” (and then, for grammatical reasons, adding an “ing” to “carry”).

P.P.S. What, then, is to be done? As my social science colleagues have (correctly) pointed out to me in many contexts, what is relevant to many decision problems is causal inference, not mere description. What would be gained by the authors of this study switching to descriptive language? I don’t know (this question is itself a causal claim on which I have no direct evidence!), but, personally, I’d prefer to first present the finding descriptively and then discuss the causal interpretations, making clear the limitations of the data in addressing such questions. Maybe that’s not so far from what was actually done here–I’m just a bit put off by the strong causal language right there in the abstract, and in particular I’m skeptical about how much they’re learning from that instrumental variables analysis, which to my eye looks like it’s picking up a (differential) time trend more than anything else. On the plus side, this is as strong as a lot of the other associations we talk about in politics.

The present blog entry is an attempt at a clarification, not a “debunking.” Dube and Naidu have done all the work; I’m just making some comments (and hoping that these comments move them towards a closer engagement with their data and maybe away from what I consider are distracting technical issues of t-statistics, reweighting, matching, fixed effects, and the like.)


  1. John S. says:

    US military aid is distributed to brigades that are closer to the fighting. Paramilitary violence is more likely to occur in those communities because that's where the conflict is.

    I don't know how much food aid the US gives to Colombia, but it's undoubtedly distributed in poorer areas of the country. Does this imply that US aid causes malnutrition?

  2. Sergio says:

    Completely agree with your opinion on this paper, their causal statements are not aligned with the limited power of their analysis. As John Tukey (1986) said "The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data." Can you even believe that the data is reliable. It seems very easy to manipulate.