Question on propensity score matching

Ban Chuan Cheah writes:

I’m trying to learn propensity score matching and used your text as a guide (pg 208-209). After creating the propensity scores, the data is matched and after achieving covariate balance the treatment effect is estimated by running a regression on the treatment variable and some other covariates. The standard error of the treatment effect is also reported – in the book it is 10.2 (1.6).

This is all straightforward until I [Cheah] came across a paper by Morgan and Harding: Matching Estimators of Causal Effects: From Stratification and Weighting to Practical Data Analysis Routines.

On page 43 of the paper in a sub-section entitled “Where are the Standard Errors?” they state:

we do not report standard errors for the treatment effect estimates reported in Table 6 for hypothetical example 4. Although there are some simple types of applications in which the variance of matching estimators is known (see Rosenbaum 2002), these are rarely analogous to the situations in which sociologists analyzing observational data will find themselves.

I [Cheah] interpret this to mean that “regular” standard errors (and even bootstrap standard errors) are incorrect. I’m having a hard time grasping why this is the case although at some level I can understand that the matching estimator might not be your usual linear unbiased estimator (having come from a logistic regression and then some matching method). However, the estimator is nothing more than a way to subset the sample to estimate treatment effects and so the estimation of the standard error using OLS as you have done should be separate from the actual creation of the sample (via matching).

My reply: As we discuss in the book, I think of matching as a way of creating a restricted dataset to then be analyzed by usual methods. That is, you create a dataset using matching, then you run a regression (or multilevel model, or whatever). Here I’m following the recommendation of Rubin from his Ph.D. thesis 40 years ago. Matching is not an alternative to regression, it’s a way of restricting your range of analysis. So, I’d just take the standard error from this regression and not worry too much about the matching that came before.

The key is that your regression includes (that is, corrects for) the variables used in the matching. If you do matching and then simply analyze from there, without regression, then, yeah, it can be messy to compute standard errors. Maybe I’d try a jackknife standard error estimate if I really needed to do this.

Cheah replies:

It sounds like the standard error of the regression without any of the variables used in the matching (i.e. just a straight mean difference between the two groups for instance) would not be valid. It is common for some analysis to obtain the matched data and then compute straight mean differences and report the standard error from this t-test (or the standard error from a regression with no other explanatory variables except the group indicator).

Yup.

4 thoughts on “Question on propensity score matching

  1. Most of these questions have been answered recently by Abadie and Imbens using a martingale representation of matching estimators. This representation allows explicit standard error formulas for virtually every type of matching estimator. See:

    http://www.hks.harvard.edu/fs/aabadie/research.ht

    In an older paper, they also discuss failure of bootstrap for matching estimators.

  2. John, not to distract from Abadie and Imbens exacting standards, but in practice the assumptions are always wrong and even independence not really guaranteed so I would suggest a less exacting approach as is nicely outlined in

    Ho, Daniel. E., Kosuke Imai, Gary King and Elizabeth A. Stuart. (2007). “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.'' Political Analysis, Vol. 15, No. 3 (Summer), pp. 199-236.

    Keith

  3. Good discussion. But I cannot see why ignoring matching is the right answer. Different samples will produce different matches and one would like the standard errors to reflect that. Keith, are you suggesting that two wrongs may make a right here?

Comments are closed.