Skip to content

Criticizing statistical methods for mediation analysis

Brendan Nyhan passes along an article by Don Green, Shang Ha, and John Bullock, entitled “Enough Already about ‘Black Box’ Experiments: Studying Mediation Is More Difficult than Most Scholars Suppose,” which begins:

The question of how causal effects are transmitted is fascinating and inevitably arises whenever experiments are presented. Social scientists cannot be faulted for taking a lively interest in “mediation,” the process by which causal influences are transmitted. However, social scientists frequently underestimate the difficulty of establishing causal pathways in a rigorous empirical manner. We argue that the statistical methods currently used to study mediation are flawed and that even sophisticated experimental designs cannot speak to questions of mediation without the aid of strong assumptions. The study of mediation is more demanding than most social scientists suppose and requires not one experimental study but rather an extensive program of experimental research.

That last sentence echoes a point that I like to make, which is that you generally need to do a new analysis for each causal question you’re studying. I’m highly skeptical of the standard poli sci or econ approach which is to have the single master regression from which you can read off many different coefficients, each with its own causal interpretation.

The article seems reasonable to me (I’m basing my judgments on the downloadable version here), although I can’t figure out why an article with three authors is written in the first person singular. Also, I’d slam them for writing a paper with no graphs–except that I just did the same thing, on the same topic!

Green et al. set things up by explaining why causal path analysis seems like a good idea:

One can scarcely fault scholars from expressing curiosity about the mechanisms
by which an experimental treatment transmits its influence. After all, many of the most
interesting discoveries in science have to do with the identifying mediating factors in a
causal chain. For example, the introduction of limes into the diet of seafarers in the 18th
century dramatically reduced the incidence of scurvy, and eventually 20th century
scientists figured out that the key mediating ingredient was vitamin C. Equipped with
knowledge about why an experimental treatment works, scientists may devise other,
possibly more efficient ways of achieving the same effect. Modern seafarers can prevent
scurvy with limes or simply with vitamin C tablets.

Arresting examples of mediators abound in the physical and life sciences. Indeed, not only do scientists know that vitamin C mediates the causal relationship between limes and scurvy, they also understand the biochemical process by which vitamin C counteracts the onset of scurvy. In other words, mediators themselves have mediators. Physical and life scientists continually seek to pinpoint ever more specific explanatory agents.

But now the bad news:

Given the strong requirements in terms of model specification and measurement, the enterprise of “opening the black box” or “exploring causal pathways” using endogenous mediators is largely a rhetorical exercise. I [Green, Ha, and Bullock] am at a loss to produce even a single example in political science in which this kind of mediation analysis has convincingly demonstrated how a causal effect is transmitted from X to Y.

And then they put it all in perspective:

My [Green, Ha, and Bullock’s] argument is not that the search for mediators is pointless or impossible. Establishing the mediating pathways by which an effect is transmitted can be of enormous theoretical and practical value, as the vitamin C example illustrates. Rather, I take issue with the impatience that social scientists often express with experimental studies that fail to explain why an effect obtains. As one begins to appreciate the complexity of mediation analysis, it becomes apparent why the experimental investigation of mediators is slow work. Just as it took almost two centuries to discover why limes cure scurvy, it may take decades to figure out the mechanisms that account for the causal relationships observed in social science.

OK, what’s everybody talkin bout?

Here’s the method that Green et al. criticize:

Although path analysis goes back several decades, mediation analyses surged in popularity in the 1980s with the publication of Baron and Kenny (1986) . . . First, one regresses the outcome (Y) on the independent variable (X). Upon finding an effect to be explained, one proposes a possible mediating variable (M) and regresses it on X. If X appears to cause M, the final step is to examine whether the effect of X becomes negligible when Y is regressed on both M and X. If M predicts Y and X does not, the implication is that X transmits its influence through M.

This approach has always seemed pretty hopeless to me, but a colleague whom I respect has defended it to me, a bit, by framing it as an adjunct to experimental research. As he puts it, the serious social psychologists would not dream of applying the mediatoin analysis stuff directly to observational data. Rather, it’s their attempt to squeeze more out of experimental data. From that perspective, maybe it’s not so horrible.

Beyond nihilism

Green et al. don’t just sit around and criticize; they also offer suggestions for moving forward:

A more judicious approach at this juncture in the development of social science would be to encourage researchers to measure as many outcomes as possible when conducting experiments. For example, consider the many studies that have sought to increase voter turnout by means of some form of campaign contact, such as door-to-door canvassing. In addition to assessing whether the intervention increases turnout, one might also conduct a survey of random samples of the treatment and control groups in order to ascertain whether these groups differ in terms of interest in politics, feelings of civic responsibility, knowledge about where and how to vote, and so forth. With many mediators and only one intervention, this kind of experiment cannot identify which of the many causal pathways transmit the effect of the treatment, but if certain pathways are unaffected by the treatment, one may begin to argue they do not explain why mobilization works. As noted above, this kind of analysis makes some important assumptions about homogenous treatment effects, but the point is that this type of exploratory investigation may provide some useful clues to guide further experimental investigation.

As researchers gradually develop intuitions about the conditions under which effects are larger or smaller, they may begin to experiment with variations in the treatment in an effort to isolate the aspects of the intervention that produce the effect. For example, after a series of pilot studies that suggested that social surveillance might be effective in increasing voter turnout, Gerber, Green, and Larimer (2008) launched a study in which subjects were presented one of several interventions. One encouraged voting as a matter of civic duty; another indicated that researchers would be monitoring who voted; a third revealed the voting behavior of all the people living at the same address; and a final treatment revealed the voting behavior of those living on the block. This study stopped short of measuring mediators such as one’s commitment to norms of civic participation or one’s desire to maintain a reputation and an engaged citizen; nevertheless, the treatments were designed to activate mediators to varying degrees. One can easily imagine variations in this experimental design that would enable the researcher to differentiate more finely between mediators. And one can imagine introducing survey measures to check whether these inducements produce an intervening psychological effect consistent with the posited mediator.

You won’t be surprised to hear that I like the focus on active research examples.


  1. anon says:

    I didn't get that first-person version when I clicked your link!

    "Our argument is not that the search for mediators is pointless or impossible."

  2. RogerH says:

    Seems to me both versions are in the first person. One's in the first person singular, while the other's in the first person plural.

  3. Andrew Gelman says:

    Yep, I meant first person singular; fixed.

  4. John Bullock says:

    Thank you, Andrew, for these comments. The article recently came out in the Annals of the American Academy of Political and Social Science:….

    As [a colleague] puts it, the serious social psychologists would not dream of applying the mediatoin analysis stuff directly to observational data.

    I have heard this, too, but I’m not sure that it’s true. I suppose that it depends on who counts as “serious.” We content-analyzed 50 randomly selected articles that cited Baron and Kenny (1986) and were published in 2007 in a journal of the American Psychological Association. Of the 50, 25 applied standard mediation analysis methods to purely observational data. Some of these 25 articles appeared in the discipline’s top journals.

    [Standard mediation analysis] is their attempt to squeeze more out of experimental data. From that perspective, maybe it's not so horrible.

    We’re pessimistic. One problem is that even when the standard methods are applied to “experimental data,” the causally intermediate variables are almost never manipulated. In these cases, mediation analysis entails estimating a model in which you control for a variable that has been affected by the treatment. You have strong passages on this approach in your book with Jennifer Hill: for example, “the benefits of randomization are generally destroyed” by this practice (p. 192). I know that you’ve blogged about this a number of times, too.

    Even if you can manipulate your intermediate variables—often hard to do—learning about causal pathways isn’t straightforward. We have a paper about this that is forthcoming at the Journal of Personality and Social Psychology. Adam Glynn has a related paper that I also like.

  5. Dustin Tingley says:

    It seems like many people want to study mechanisms. Some do this explicitly, some obtain empirical results consistent with some mechanism but do nothing to test whether the mechanism is at work. So it is great to see various papers attempting to think about this issue.

    In collaborative work with Kosuke Imai, Teppei Yamamoto and Luke Keele we take a different approach to the subject. We use a causal inference framework to show what assumptions need to be made and what different experimental designs imply for conducting mediation analysis. The recent papers by Green et al. suggest substantial caution with existing practices, and they are all excellent papers well worth the read. We think much can be done, however, and are not quite as pessimistic about the enterprise.

    For example, our approach provides ways to conduct sensitivity analyses to violations of the key identifying assumption, just as others have done with problems that confront users of observational data. We also use our formal framework to evaluate different experimental designs. What happens if we use other experimental designs than the standard "single-experiment" design that Green et al. focus on? What other assumptions do you have to make? What ways might mediators be manipulated? We think creativity in these regards will lead to a progressive research agenda that will enable social scientists to study causal mediation effects.

    More information is available at

    though we also echo John's shout out to Adam Glynn's excellent paper on this topic which also makes a number of really nice insights.