[The following is a review essay invited by the American Journal of Sociology. Details and acknowledgments appear at the end.]
In social science we are sometimes in the position of studying descriptive questions (for example: In what places do working-class whites vote for Republicans? In what eras has social mobility been higher in the United States than in Europe? In what social settings are different sorts of people more likely to act strategically?). Answering descriptive questions is not easy and involves issues of data collection, data analysis, and measurement (how should one define concepts such as “working class whites,” “social mobility,” and “strategic”), but is uncontroversial from a statistical standpoint.
All becomes more difficult when we shift our focus from What to What-if and Why.
Thinking about causal inference
Consider two broad classes of inferential questions:
1. Forward causal inference. What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth?
2. Reverse causal inference. What causes Y? Why do more attractive people earn more money, why do many poor people vote for Republicans and rich people vote for Democrats, why did the economy collapse?
In forward reasoning, the potential treatments under study are chosen ahead of time, whereas, in reverse reasoning, the research goal is to find and assess the importance of the causes. The distinction between forward and reverse reasoning (also called “the effects of causes” and the “causes of effects”) was made by Mill (1843). Forward causation is a pretty clearly-defined problem, and there is a consensus that it can be modeled using the counterfactual or potential-outcome notation associated with Neyman (1923) and Rubin (1974) and expressed using graphical models by Pearl (2009): the causal effect of a treatment T on an outcome Y for an individual person (say), is a comparison between the value of Y that would’ve been observed had the person followed the treatment, versus the value that would’ve been observed under the control; in many contexts, the treatment effect for person i is defined as the difference, Yi(T=1) – Yi(T=0). Many common techniques, such as differences in differences, linear regression, and instrumental variables, can be viewed as estimated average causal effects under this definition.
In the social sciences, where it is generally not possible to try more than one treatment on the same unit (and, even when this is possible, there is the possibility of contamination from past exposure and changes in the unit or the treatment over time), questions of forward causation are most directly studied using randomization or so-called natural experiments (see Angrist and Pischke, 2008, for discussion and many examples). In some settings, crossover designs can be used to estimate individual causal effects, if one accepts certain assumptions about treatment effects being bounded in time. Heckman (2006), pointing to the difficulty of generalizing from experimental to real-world settings, argues that randomization is not any sort of “gold standard” of causal inference, but this is a minority position: I believe that most social scientists and policy analysts would be thrilled to have randomized experiments for their forward-causal questions, even while recognizing that subject-matter models are needed to make useful inferences from any experimental or observational study.
Reverse causal inference is another story. As has long been realized, the effects of action X flow naturally forward in time, while the causes of outcome Y cannot be so clearly traced backward. Did the North Vietnamese win the American War because of the Tet Offensive, or because of American public opinion, or because of the skills of General Giap, or because of the political skills of Ho Chi Minh, or because of the conflicted motivations of Henry Kissinger, or because of Vietnam’s rough terrain, or . . .? To ask such a question is to reveal the impossibility of answering it. On the other hand, questions such as “Why do whites do better than blacks in school?”, while difficult, do not seem inherently unanswerable or meaningless.
We can have an idea of going backward in the causal chain, accounting for more and more factors until the difference under study disappears–that is, is “explained” by the causal predictors. Such an activity can be tricky–hence the motivation for statistical procedures for studying causal paths–and ultimately is often formulated in terms of forward causal questions: causal effects that add up to explaining the Why question that was ultimately asked. Reverse causal questions are often more interesting and motivate much, perhaps most, social science research; forward causal research is more limited and less generalizable but is more doable. So we all end up going back and forth on this.
We see three difficult problems in causal inference: Continue reading →