Matching, regression, interactions, and robustness

Daniel Ho, Kosuke Imai, Gary King, and Liz Stuart recently wrote a paper on matching, followed by regression, as a tool for causal inference. They apply the methods developed by Don Rubin in 1970 and 1973 to some political science data, and make a strong argument, both theoretical and practical, for why this approach should be used more often in social science research.

I read the paper, and liked it a lot, but I had a few questions about how it “sold” the concept of matching. Kosuke, Don, and I had an email dialogue including the following exchange.

[The abstract of the paper claims that matching methods “offer the promise of causal inference with fewer assumptions” and give “considerably less model-dependent causal inferences”]

AG: Referring to matching as “nonparametric and non-model-based” might be misleading. It depends on how you define “model”, I guess, but from a practical standpoint, information has to be used in the matching, and I’m not sure there’s such a clear distinction between using a “model” as compared to an “algorithm” to do the matching.

DR: I think much of this stuff about “models” and non-models is unfortunate. Whatever procedure you use, you are ignoring certain aspects of the data (and so regarding them as irrelevant), and emphasizing aspects as important. For a trivial example, when you do something “robust” to estimate the “center” of a distn, you are typically making assumptions about the definition of “center” and the irrelevance of extreme observations to the estimation of the it. Etc.

KI: I want to take this opportunity and ask you one quick question. When I talk about matching methods to political scientists who are so used to running a bunch of regressions, they often ask why matching is better than regression or why they should bother to do matching in combination with regressions. What would be the best way to answer this question? I usually tell them about the benefits of the potential outcome framework, available diagnostics, and flexibility (e.g., as opposed to linear regression) etc. But, I’m wondering what you would say to social scientists!

AG: Matching restricts the range of comparison you’re doing. It allows you to make more robust inferences, but with a narrower range of applicability. See Figure 7.2 on page 227 of our book for a simple picture of what’s going on, in an extreme case. Matching is just a particular tool that can be used to study a subset of the decision space. The phrase I would use for social scientists is, “knowing a lot about a little”. The papers by Dehejia and Wahba discuss these issues in an applied context: http://www.columbia.edu/%7Erd247/papers/w6586.pdf and http://www.columbia.edu/%7Erd247/papers/matching.pdf

DR: Also look at the simple tables in Cochran and Rubin (1973) or Rubin (1973) or Rubin (1979) etc. They all show that regression by itself is terribly unreliable with minor nonlinearity that is difficult to detect, even with careful diagnostics. This message is over three decades old!

KI: I agree that matching gives more robust inferences. That’s the main message of the paper that I presented and also of my JASA paper with David. The question my fellow social scientists ask is why matching is more robust than regressions (and hence why they should be doing matching rather than running regressions). One answer is that matching removes some observations and hence avoids extrapolation. But how about other kinds of matching that use all the observations (e.g., subclassification and full matching)? Are they more robust than regressions? What I usually tell them is that regressions often make stronger functional form assumptions than matching. With stratification, for example, you can fit separate regressions within each strata and then aggregate the results (therefore it does not assume that the same model fits all the data). I realize that there is no simple answer to this kind of a vague, and perhaps illposed, question. But, these are the kind of questions that you get when you tell soc. sci. about matching methods!

AG: I think the key idea is avoiding extrapolation, as you say above. But I don’t buy the claim that regression makes stronger functional form assumptions than matching. Regressions can (and should) include interactions. Regression-with-interaction, followed by poststratification, is a basic idea. It can be done with or without matching.

The social scientists whom I respect are more interested in the models than in the estimation procedures. For these people, I would focus on your models-with-interactions. If matching makes it easier to fit such models, that’s a big selling point of matching to me.

KI: One can think of it as a way to fit models with interactions in the multivariate setting by helping you create matches and subclasses. One thing I wanted to emphasize in my talk is that matching is not necessary a substitute for regressions, and that one can use matching methods to make regressions perform more robust.