Pearl’s and Gelman’s final thoughts (for now) on causal inference

After six entries and 91 comments on the connections between Judea Pearl and Don Rubin’s frameworks for causal inference, I thought it would be good to draw the discussion to a (temporary) close. I’ll first present a summary from Pearl, then briefly give my thoughts.

Pearl writes:

Recently, there have been several articles and many blog entries concerning the question of what measurements should be incorporated in various methods of causal analysis.The statement below [from Pearl] is offered by way of a resolution that (1) summarizes the discussion thus far, (2) settles differences of opinion and (3) remains faithful to logic and facts as we know them today.

The resolution is reached by separating the discussion into three parts: 1. Propensity score matching 2. Bayes analysis 3. Other techniques

1. Propensity score matching. Everyone is in the opinion that one should screen variables before including them as predictors in the propensity-score function.We know that, theoretically, some variables are capable of increasing bias (over and above what it would be without their inclusion,) and some are even guaranteed to increase such bias.

1.1 The identity of those bias-raising variables is hard to ascertain in practice. However, their
general features can be described in either graphical terms or in terms of the “assignment mechanism”, P(W|X, Y0,Y1),if such is assumed.

1.2 In light of 1.1, it is recommend that the practice of adjusting for as many measurements as possible should be approached with great caution. While most available measurements are bias-reducing, some are bias-increasing.The criterion of producing “balanced population” for
matching, should not be the only one in deciding whether a measurement should enter the propensity score function.

2. Bayes analysis. If the science behind the problem, is properly formulated as constraints over the prior distribution of the “assignment mechanism” P(W|X, Y, Y0,Y1), then one need not exclude any measurement in advance; sequential updating will properly narrow the posteriors to reflect both the science and the available data.

2.1 If one can deduce from the “science” that certain covariates are “irrelevant” to the problem at hand,there is no harm in excluding them from the Bayesian analysis. Such deductions can be derived either analytically, from the algebraic description of the constraints, or graphically, from the diagramatical description of those constraints.

2.2 The inclusion of irrelevant variables in the Bayesian analysis may be advantageous from certain perspectives (e.g., provide evidence for missing data) and dis-advantageous from others (e.g, slow convergence, increase in problem dimensionality, sensitivity to misspecification).

2.3 The status of intermediate variables (and M-Bias) fall under these considerations. For example, if the chain Smoking ->Tar-> Cancer represents the correct specification of the problem, there are advantages (e.g., reduced variance (Cox, 1960?)) to including Tar in the analysis even though the causal effect (of smoking on cancer) is identifiable without measuring Tar, if Smoking is randomized. However, misspecification of the role of Tar, may lead to bias.

3. Other methods. Instrumental variables, intermediate variables and confounders can be identified, and harnessed to facilitate effective causal inference using other methods, not involving propensity score matching or Bayes analysis. For example, the measurement of Tar in the example above, can facilitate a consistent estimate of the causal effect (of Smoking on Cancer) even in the presence of unmeasured confounding factors, affecting both smoking and cancer. Such analysis can be done by either graphical methods (Causality, page 81-88) or counterfactual algebra (Causality, page 231-234).

Thus far, I [Pearl] have not heard any objection to any of these conclusions, so I consider it a resolution of what seemed to be a major disagreement among experts. And this supports what Aristotle said (or should have said): Causality is simple.

I am not a causal inference expert in the way that Rosenbaum, Rubin, and Imbens are, by I will nonetheless give my thoughts on the above.

1. Propensity score matching is an important method, but I don’t think it’s fundamental in understanding causality. I think of propensity scores as a way of adjusting for large numbers of background variables. Again, I would point readers to the Dehejia and Wahba paper from 1999 which discusses the importance of controlling for key covariates. I think Pearl’s discussion above is slightly confused by using the general term “adjusting for.” Rubin, Imbens, etc., will adjust for all variables, but not necessarily by including them in the propensity score.

2. Pearl’s statement about Bayesian analysis seems reasonable to me.

3. The 1996 Angrist, Imbens, and Rubin paper puts instrumental variables into a clean Bayesian framework. I’m sure there are non-Bayesian approaches that can solve these problems too.

Finally, I don’t agree with Pearl that causality is simple! I don’t see any easy answers for the sorts of problems where you want to estimate a causal pathway through intermediate outcomes. See here for a pointer to Michael Sobel’s recent discussion of these issues.

All of us in the social sciences have seen lots of talks where you see a big table of regression coefficients and then the speaker interprets one after the other causally–despite the difficulty of interpreting a change in each with all others held constant. Two useful principles for me are (1) understand the data descriptively, in any case, and (2) perform a separate analysis for each causal claim. I’m not saying these are general principles, but they’ve helped me keep my head when things get confusing.

Let me conclude the discussion by thanking Judea Pearl and the many commenters for a fascinating discussion. As I’ve said before, the various methods of Pearl, Imbens and Rubin, Greenland and Robins, and others have all been useful to many researchers in different settings. I think it’s helpful to develop statistical methods in the context of applications, and also to work toward theoretical understanding, as Pearl has been doing.

2 thoughts on “Pearl’s and Gelman’s final thoughts (for now) on causal inference

  1. Causality is simple, if we do not bend it.
    Andrew,
    I am glad we have concluded our discussion with
    only one major disagreement – whether causality
    is simple or hard. In support of the latter, you
    have pointed me to an article by Michael Sobel
    ( http://www.sociology.columbia.edu/pdf-files/msobe… which, supposedly, finds
    special difficulties in defining and estimating mediation.

    I have posted two quick responses to that paper,
    and I now I have had the chance to read Sobel's
    paper in greater detail. I would like to share my
    reaction with you and your readers, and to relate
    it to our question: Is causality simple?

    First, even if Sobel convinces us that mediation presents a special problem to SEM researchers,
    it does not mean that causal analysis, as a discipline, is hard.
    The fact that we cannot solve two equations with
    three unknowns does not make highshool algebra
    a difficult subject — when we do not have the
    necessary information, we do not expect to produce a solution. Algebra is simple because it gives us the machinery to determine quickly whether we have enough information
    or not. The same applies to causal analysis — we now have that machinery at hand.

    Now to Sobel's paper.

    1. Background:
    Researchers in the social sciences have been giving causal interpretation to structural coefficients. They have devised model-based criteria for identifying those coefficients
    and regression-based techniques for estimating them, and,once identified and estimated, they have considered the estimates as measuring direct causal effects among the corresponding
    variables.

    2. Sobel's argument
    Sobel argues against giving structural
    coefficients causal interpretation.
    His reason: These coefficients do not coincide,
    except in special cases, with the TRUE causal coefficients, where by "true causal coefficients" we mean those defined counterfactually.

    Sobel further identifies an extra assumption (his equation (20)) that is needed to ensure equality between the structural and the
    "causal" coefficients (his Theorem 1) and recommends that SEM researchers use his criterion to "reexamine the validity of previous
    work and ask if it is reasonable or not to assume (20) in a particular application."

    3. Critique of Sobel argument
    Sobel is wrong in defining structural coefficients in terms of regression, and in assuming that they are any different from the "causal coefficients" that he defines counterfactually. Early economists
    (Haavelmo, Marschak, Hurwitz, Simon, Fisher,
    Chris, even Goldberger and, of course, Heckman) have all given structural equations counterfactual interpretation (though not in formal notation). The definition of structural coefficients has nothing to do with regression;
    an estimation method that sometimes give the correct magnitude (of the structural coefficient)
    and sometimes does not.

    I have examined Sobel's extra assumption (20)
    and found that, as expected, it coincides precisely with the standard SEM condition for the identification of structural
    coefficients (Specifically, that two error terms be uncorrelated, eps.(1) and eps.(2) in his Figure 1)

    In general, it can be shown that IDENTIFIED structural coefficients always coincide with (the estimands of) their associated causal parameters and, moreover, the assumptions that justify the identification of structural coefficients are precisely those that are needed for consistent estimation of "causal coefficients." For that reason, it is safe to speak about the structural coefficients themselves as BEING the causal coefficients.

    For example, consider the under-identified structural equations
    that Sobel uses to describe mediation:
    M = a1*Z + eps1
    Y = a2*Z + a3*M + eps2
    Assume now that Z is NOT randomized. Rather, Z, eps1 and eps2 are
    highly correlated. It is still perfectly safe to confer causal interpretation on a1, a2, and a3, and proclaim the total effect of Z on Y to be T = a2 + a1*a3
    It is also safe to equate a1*a3 with indirect effect of Z on Y, counterfactually defined as:
    a1*a3 = E(Y(z, M(z))-Y(z,M(z') / (z-z')
    as defined by Sobel. Our inability to identify
    a1*a3 given the information at hand,
    should not tarnish its causal interpretation.

    In my paper on mediation
    http://ftp.cs.ucla.edu/pub/stat_ser/R273-U.pdf
    I show that, in linear systems,
    the counterfactual definition leads to the additive and multiplication
    rules of combining structural coefficients (Wright 1921). So, should SEM researchers panic and heed to Sobel warning to "reexamine the validity of previous work and ask if it is reasonable or not to assume (20) in a particular
    application." ??

    Absolutely not; they have already done so when they justified the assumptions that render the structural coefficients identifiable. Moreover, they have done so in a language that is much more transparent and meaningful than
    that recommended by Sobel.

    To witness, the assumptions that two omitted factors be uncorrelated is many times more transparent than the same assumption articulated
    in the language of ignorability, e.g., that the
    potential outcome of the mediator, had assignment
    been zero, be independent of the potential value of the outcome, had treatment assignment and mediating variable been at different levels.

    In Causality chapter 11, I show that the condition of ignorability is subsumed by the condition of independence among omitted factors.
    I have met many researchers arguing
    about omitted factors and none arguing about
    the validity of ignorability — it is too cryptic. Sobel himself, when attempting to show that ignorability can be violated in his example,
    resorts to arguments based on omitted factors.
    (students' smartness), not to potential
    outcome considerations. "Students smartness"
    even omitted, has a name, and anchors one's
    thoughts on the causal relationships that operate in the problem. Ignorability conditions anchor one's thought on outcomes of hypothetical "black box" experiments that
    are hard to envision and ascertain.

    I prefer therefore to let SEM researchers continue to express knowledge in the communicable
    language of structural equations, and educate ourselves about how the counterfactual logic of the 21st century supports what they have set out to do in the 1950's, before their methods got messed up by bad economists and regression addicts.

    My conclusion remains: Causality is simple,
    and I wish I knew why you thought that Sobel's article should spoil our optimism.

    ============Judea

  2. Unlike many people here, I am a business practitioner who happens to have background & experience in statistics and management science as well as data mining, machine learning, marketing science, and other analytics. I am a huge fan of both Pearl and Rubin, and also a fan of many of their colleagues and students (Greenland, Robins, Rosenbaum, Gelman, etc.).

    I see that the Rubin Causal Model (RCM) has been applied in so many fields (less so in the business area) and I love propensity scoring. Yet the more I read about Pearl's work the more I admire Pearl's in-depth understanding of causality. Pearl did explain that RCM can be regarded as a subset of his work, which I would agree. However, we also see tht while RCM is widely used, Pearl's work is not. I think it'd be great (and necessary for future generations) if someone can translate Pearl's work and promote it in other fields. Pearl's been doing a lot of it himself by writing papers targeting different groups of audience. However, many of his papers are too similar to each other. We would need other people (perhaps Pearl's friends, fans, and students) to help push to the general public and integrate with the more known RCM framework. For instance, the causal diagram idea not only is a nice-to-have but is considered a must (at least for many epidemiologists) before measuring causality in observational data. Also, while the "back-door" criterion is well-known and is close to stat methods (regression-based or propensity scoring), the "front-door" path is less known and should be publicized more. The "do-calculus" is ineed a ground-breaking operation and also needs to be pushed more to the public. In fact, we need experts who can integrate the Pearl and Rubin's work together in one package.

    Thoughts from the experts?

Comments are closed.