A colleague writes:

Why do people keep praising matching over regression for being non parametric? Isn’t it f’ing parametric in the matching stage, in effect, given how many types of matching there are… you’re making structural assumptions about how to deal with similarities and differences…. the likelihood two observations are similar based on something quite similar to parametric assumptions… you’re just hiding the parametric part..

My reply: It’s not matching *or* regression, it’s matching *and* regression. Matching is a way to discard some data so that the regression model can fit better. Trying to do matching without regression is a fool’s errand or a mug’s game or whatever you want to call it. Jennifer and I discuss this in chapter 10 of our book, also it’s in Don Rubin’s PhD thesis from 1970!

Your old post on this: http://andrewgelman.com/2011/07/10/matching_and_re/

1. Matching need not be parametric. There matching methods other than the propensity score (e.g. I am not sure I would call coarsened exact matching parametric). In any case, I don’t think this is the main advantage of matching.

2. Yes, in principle matching and regression are the same thing, give or take a weighting scheme. But I think the philosophies and research practices that underpin them are entirely different. This is where I think matching is useful, specially for pedagogy.

For example, regression alone lends it self to (a) ignore overlap and (b) fish for results. This is because setting up the comparison and the estimation are all done at once. By contrast matching focuses first on setting up the “right” comparison and, only then, estimation. I think this makes a big difference.

OK, sure, but you can always play around with the matching until you fish the results. True, but then again you can’t prevent an addict from getting his fix if he is hell bent on it. Matching will not stop fishing, but it can help teach the importance of a research design separate from estimation. I think that is an important lesson. And yes, you can use regression etc. in addition.

To quote Rosenbaum: “An observational study that begins by examining outcomes is a formless, undisciplined investigation that lacks design” (Design of Observational Studies, p. ix). Welcome the the world of regression!

Most of the matching estimators (at least the propensity score methods and CEM) promise that the weighted difference in means will be (nearly) the same as the regression estimate that includes all of the balancing covariates. That’s always been my experience.

The advantage that matching plus regression has over regression alone is that it doesn’t rely on a specific functional form for the covariates. This is why some refer to it as ‘non-parametric,’ even though matching still relies on a large set of assumptions (covariates, distance metric, etc.) that can be manipulated for data-mining. In fact, matching makes data-mining easier because there are a larger set of choices and the treatment effect tends to vary across them more than across regression models.

(Matching and regression are not the same thing up to a weighting scheme. This is only true if, as in MHE, you are using a saturated model for which covariate nonlinearities don’t matter.)

@Mike

Matching plus regression still adds functional form unless fully saturated no?

All causal inference relies on assumptions. But I would say the number of restrictions imposed by matching are a subset of those imposed by regressions.

I don’t follow how this can lead to more data mining. Note that playing around with covariate balance without looking at outcome variable is fine. Impossing linearity and limiting interactions will make estimates more stable but not necessarily better.

M+R still relies on assumptions about the set of covariates, certainly, but doesn’t assume a linear model. It may or may not make assumptions about interactions, depending on whether these are balanced. (They are with CEM, but not necessarily with other techniques.)

Here’s the reason this can still lead to more data-mining: When matching, you’re still choosing the set of covariates to match on and there’s nothing stopping you from trying a different set if you don’t like the results. This is exactly parallel with trying different covariates in a regression model. The intermediate balancing step is irrelevant. Further, the variation in estimates across matches is greater than across regression models. Combine that with the larger set of choices to exploit when matching (calipers, 1-to-1 or k-to-1, etc.) and it’s easier to data-mine when matching. If you’re interested, I have a paper that’s mostly on this subject (sites.google.com/site/mkmtwo/Miller-Matching.pdf).

Mike: “When matching, you’re still choosing the set of covariates to match on and there’s nothing stopping you from trying a different set if you don’t like the results. This is exactly parallel with trying different covariates in a regression model. The intermediate balancing step is irrelevant.”

I disagree with last phrase. I think pedagogically it is very different to set up a comparison first and then estimation. Among other it allows am almost physical distinctions btw research design and estimation not encouraged in regressions. And students can do this without 2 semesters of stats, multivariate regression, etc… All they need is some common sense to compare like with like and computing weighted averages.

Mike: “Combine that with the larger set of choices to exploit when matching (calipers, 1-to-1 or k-to-1, etc.) and it’s easier to data-mine when matching.”

Again, if you are bent on data mining nothing is going to stop you. But I’d like to see a _proof_ that the set of choices in matching is larger. My intuition is that set of choices in matching is strictly a subset of regression. 1-to-1, k-to-1 has a regression equivalent: Dropping outliers, influential observations, or, conversely, extrapolation, etc.. Yet regression adds choices re functional form restrictions for the outcome equation that are not available in pure matching.

Fernando, I think we’re mostly in agreement here. Pedagogically, matching and regression are different. But I don’t think that translates into any statistical or research advantage. Moreover, I think some scholars strain the point that matching lets you compare “like with like,” forgetting that this is only true with respect to the chosen covariates.

You’re right — nothing can stop you if you’re intent on data-mining, but I still hold that matching makes it easier and easier to hide. Again, this is partly because matching shows greater variation across matches. Are there more choices to exploit? I would say yes, since matching gives you control over both the set of covariates and the sample itself. You don’t make functional form assumptions, true, but you can (and should) choose higher-order terms and interactions to balance on, so you have the same degrees of freedom there.

I think the crucial take-away is the essential similarity of M+R and regression alone. The former is more robust to covariate nonlinearities, but has no advantages for causation, model dependence, or data-mining, which remain its most popular justifications.

Mike:

Comparing “like with like” in the context of a theory or DAG. It is the theory that tells you what to control for. This is not a property of matching or regression. Matching mostly helps ensure overlap.

Mike: “Matching gives you control over both the set of covariates and the sample itself”.

Depends on your point of departure. As mentioned the set of covariates ought to be a theoretical question, while arguably extrapolating lets you control the sample. We talk about “pruning” in matching but really we should talk about “extrapolating” in regression. (typically we understand the world by layering more assumptions no less, so I see the progression from matching to extrapolation).

In the final analysis if your concern is mining the right solution is registration (and even that can be gamed). Other than that I like matching for its emphasis on design but agree with Andrew re doing both.

I agree that one should appeal to theory to justify covariates, but that doesn’t solve the issue of mining or how to construct your match. There are typically a hundred different theories one could appeal to, so there will always be room for manipulation.

I’m lost on why you think “extrapolating lets you control the sample.” One ought to start with a theoretically justified sample, say all countries from 1950-2010, a representative survey of voters, etc. The question then is whether to run a regression on that sample or to first select out a new sample to maximize balance (a quantity that is defined by the researcher). My point is simply that the latter gives one more opportunity for manipulation since it provides more choices.

@Mike

Suppose you want to estimate effect of X on Y conditional on confounder Z.

If you go at it completely non-parametrically you compute effect within strata of Z. Kind of exact matching.

But you cannot compute effect in strata where X does not vary, so these observations drop out. However, if you are willing to make more assumptions you can include these additional observations by extrapolating.

In causal inference we typically focus first on internal validity. Seldom do people start out with a well defined population (though they should). No matter. As per example above if you do it may require layering more assumptions for extrapolating.

In sum, If research progresses by layering more assumptions (it need not) then we are not prunning. Rather we start from a prunned sample and then expand by adding more assumptions and extrapolating. From this perspective it is regression that allows you to play with sample size.

I’ve looked around a bit and seen that there is a huge literature on how to do matching well, but rather little providing guidance on when matching is or is not a good choice. In cases where the variables which would participate in a match are relatively independent, matching has the disadvantage of throwing-away perfectly good data — performing a regression which uses all of the prognostic variables as covariates yields smaller standard errors than doing the same with the reduced data set following matching, and much better than a t-test or anova on the reduced data set following matching. It seems to me (following a fair bit of simulation-based exploration of the concept) that matching has been rather oversold as a methodology. The only good justification I can see for matching is when important prognostic variables lack independence — and even then I might lean towards utilizing principal component scores or ridge regression or regression supplemented with propensity scores. Granted, if the person doing an analysis is not a statistician, matching is a relatively safe approach — but people who are not statisticians should no more be performing analyses than statisticians should be performing surgeries.

Yeah, like the statistician that performed the Himmicanes study….

Jeff Smith has very useful comments in this 2010 post:

http://econjeff.blogspot.com/2010/10/on-matching.html

Winston thanks for the link.

Especially liked this “There is also a third tribe, which I think of as the “benevolent deity” tribe. They believe that whatever variables happen to be in the data set they are using suffice to make “selection on observed variables” hold. This tribe has a lot of members”

The matching AND regression was in Don Rubin’s PhD thesis from 1970 and a couple of his 1970′s papers.

What I find interesting is how such a simple suggestion “do both” has been so well and widely ignored.

I think Jasjeet Sekhon was pointing to one reason in Opiates for the matches (methods that that third tribe _can and will_ use?)

“And the only designs I know of that can be mass produced with relative success rely on random assignment. Rigorous

observational studies are important and needed. But I do not know how to mass produce them.”

http://sekhon.polisci.berkeley.edu/papers/annualreview.pdf