Some questions about causal inference

Jose Pedro Gala sends in some quesetions about statistical methods for causal inferences. I’ll give his questions, then my responses. Jose writes:

1. Type of U in Sensitivity Analyses: We’ve now seen various different methods proposed for sensitivity analysis for causal inference. For many of these methods, a decision has to be made about the type of variable U is–i.e., whether it is binary, continuous, etc. This decision is often arbitrary (in particular when one does not really have a specific unmeasured confounder in mind), although a binary U seems to be the most common choice probably for mathematical convenience! My question is whether it really matters? I.e., if we assume U is binary when in fact the unmeasured confounder needed to really achieve ignorability is in fact continuous, does it matter?? Any thoughts on this? I cannot find appropriate articles in the lit that shed light on this. What might be missed in the subsequent sensitivity analysis if we pretend to have the wrong U? Or does assuming binary U suffice? (I thought this question would be particularly interesting since there is currently very little on your blog concerning sensitivity analyses.)

2. I have heard time and time again, folx say that propensity score methods (and others) rely on the “constant treatment effect assumption” that E [Y(1) – Y(0)] = ATE = Y_i(1) – Y_i(0) for all i. Y(z) here is the potential outcome at the index z \in {0,1}. I disagree with this. If one wants to make this assumption, that is fine, but I don’t see the _need_ to. In fact, in general I simply don’t believe that the average treatment effect is equal to the individual treatment effects for everyone in the population. I may, even, later want to study how the individual causal effects vary (summarized by a covariate, say), such as for example, look at the effect E [Y(1) – Y(0) | Gender]. What are your thoughts?

3. Finally, what work has been done that discusses the propogation of errors in propensity score methods? Here’s what I mean: propensity scores are typically estimated, yet I don’t see users of the method ever state how they adjust their standard errors for the subsequent \hat{ATE} to account for the “error” in the estimation of the propensity score. I should note that I see this happening less with folx that use propensity score weighting (they often use the robust or sandwich standard errors which account for the estimation of the weights (prop scores)).

My responses:

1. In thie sorts of problems I work on, I think it makes sense to think of U as continuous (e.g., there is a continuous range of people, rather than 2 types). But if you work in the Rubin causal framework (which I’m still trying to understand), it might be possible to avoid these latent U’s entirely and just work with principal strata defined by the distribution of potential outcomes.

2. I don’t see why propensity scores would rely on a constant treatment effect. Often if’s of interest when treatment effects vary. See the work of Rajeev Dehejia on propensity scores and on varying treatment effects.

3. This is a tough problem. I think the right way to do it is through hierarchical Bayes, allowing treatment effects to vary by categories, but in practice I don’t know that it’s been done. You could try bootstrap or jackknife, maybe.

2 thoughts on “Some questions about causal inference

  1. on 1: one can think of U as one continues or even multiple unobserved confounders. in fact, on can show that within the Rosenbaum sensitivity framework, the non-parametric Manksi bounds are an extreme case of hidden bias arising from U

    on 2: they do not rely on a constant treatment effect

    on 3: currently there is no way known to me of adjusting s.e. of matching for first step estimation of the PS. (although that is easy to do with estimators of treatment effect that weigh on the PS). if you match on the true propersity score the bootstrap will not work (at least for NN-matching on the propensity score with replacement.) my guess is that the same is true for the estimated propensity score, although it could be the case that estimating the propensity score in the first step makes the matching estimator smooth enough for the bootstrap to work. the boostrap certainly does not work for regular NN-Matching, see Alberto and Guido's paper on this topic at: http://ksghome.harvard.edu/~.aabadie.academic.ksg

    best,
    jens

  2. 1. Others know more about this than I do, but there is a new paper by Andrea Ichino worth reading on this.

    2. If you estimated the mean effect of treatment on the treated under the assumption that Y0 indep D | X then you can have heterogenous treatment effects. If you estimate the ATE under (Y0,Y1) indep D | X then you can only have heterogeneous treatment effects that vary with X. These two cases are more different than literature makes them out to be.

    3. Heckman, Ichimura and Todd (1997) in the Review of Economic Studies provide asymptotic variance formulae for kernel matching. Because the propensity score model is parametric it converges faster and thus the estimation error in the propensity score does not appear in the asymptotic variance formula. They also present an analysis showing that failing to take account of the estimation error associated with the propensity score leads the asymptotic variance estimates to understate the truth in samples of reasonable size.

    Bootstrapping should work for matching estimators based on smooth non-parametric smoothers, such as kernel matching or local linear matching, and for inverse probability weighting. The aspect of NN that causes the problem is the lack of smoothness.

    Jeff

Comments are closed.