A link from Simon Jackman’s blog led me to an article by James Heckman, Hedibert Lopes, and Remi Piatek from 2011, “Treatment effects: A Bayesian perspective.” I was pleasantly surprised to see this, partly because I didn’t know that Heckman was working on Bayesian methods, and partly because the paper explicitly refers to the “potential outcomes model,” a term I associate with Don Rubin. I’ve had the impression that Heckman and Rubin don’t like each other (I was a student of Rubin and have never met Heckman, so I’m only speaking at second hand here), so I was happy to see some convergence.

I was curious how Heckman et al. would source the potential outcome model. They do not refer to Rubin’s 1974 paper or to Neyman’s 1923 paper (which was republished in 1990 and is now taken to be the founding document of the Neyman-Rubin approach to causal inference). Nor, for that matter, do Heckman et al. refer to the more recent developments of these theories by Robins, Pearl, and others. (They do cite Rubin in reference to his work on matching.)

Here’s what Heckman et al. say about causal inference:

The well-known fundamental problem in program evaluation derives from the fact that people can never be observed in different treatment states simultaneously, which makes it is impossible to directly observe their outcome gains. . . .

The textbook model considered in this paper is an extension of the original Roy model (Roy, 1951; Heckman and Honore, 1990) and assumes a binary treatment decision D that involves two continuous potential outcomes Y1 and Y0 for the treated and untreated states . . . The relationship of the Roy model to other models of potential outcomes is discussed in Heckman (2008).

I wasn’t familiar with Roy (1951), which comes roughly halfway between Neyman’s original work on this area and Rubin’s later developments. So I googled and took a look. The Roy paper is called “Some thoughts on the distribution of earnings” and presents a model of a miniature society whose members can make their living out of some mixture of hunting and fishing. I don’t see the potential outcome model here in any way. Here’s some discussion at Heckman (2008):

The Roy model (1951) is another version of this framework with two possible treatment outcomes (S = {0, 1}) and a scalar outcome measure and a particular assignment mechanism . . .

But I don’t see it. I mean, sure, I see a vague connection, but only very vague. I don’t know the history here—maybe Roy was this brilliant researcher who talked about the potential outcome paper but just didn’t put the details in the article? I also took a look at Heckman and Honore (1990), which did discuss a joint distribution, but not of potential outcomes—they were talking about a bivariate distribution of skills.

It can be difficult to trace back an idea. You want to give credit to early work that inspired important later developments, but you have to be careful not to credit the old stuff for more than it is. I think Heckman’s labeling of potential outcomes as the “Roy model” is going too far. Unless I’m missing something important, which is certainly possible.

The Roy model is a model of sector choice for employment. The idea is that a person observes potential earnings in two sectors, y1 and y2, and chooses max(y1,y2). The analyst observes only the outcome in the chosen sector, y = max(y1,y2). This set up is extremely similar to the usual potential outcomes framework in which we observe y_t only if option t is selected.

I believe there are two reasons Heckman emphasizes the Roy model. First, he’s a labor economist, so this may be the right historical approach for him. But more importantly, the Roy model is an explicit economic model of selection. In the usual Neyman-Rubin approach (at least as I understand it), we know that counterfactual outcomes may differ by treatment status, but we don’t have a clear model of how people select their treatment. The Roy model provides exactly that, in what strikes me at least as a very intuitive and elegant way.

By the way, in Heckman and Honore the skills are closely related to potential outcomes–they determine earnings in each sector.

Dan:

Yes, it makes sense that given his interest in selection models (and, more generally, economists’ interest in agents’ decisions), this model could be appealing to Heckman as a foundation or starting point. As a statistician, it’s natural for me to see the Neyman-Rubin model as more basic (as it is simply a model of outcomes without considering selection), but I can see how the selection model could seem to an economist to be more fundamental.

From the Heckman paper linked below by Gray:

“In truth, the essential aspect of the structural approach is joint modeling of outcome and choice equations.”

Just thought it was interesting and relevant that Heckman is almost defining the difference between something like “pure statistics” and “structural econometrics” as precisely this problem of modeling selection.

Don had a short comment on Heckman’s citation of the Roy model in his Fisher Lecture “Causal Inference Using Potential Outcomes: Design, Modeling, Decisions” (Rubin 2005 JASA):

“In the economics literature, the use of the potential outcomes notation to define causal effects has recently (e.g., Heckman 1996) been attributed to Roy (1951) or Quandt (1958), which is puzzling because neither of these articles addresses causal inference, and the former has no mathematical notation at all. For seeds of potential outcomes in economics, the earlier references cited at the start of this paragraph are much more relevant; see the rejoinder by Angrist, Imbens, and Rubin (1996) for more on this topic.”

Heckman and Rubin had a very interesting debate at a workshop many years ago. It’s transcribed in the book “Drawing Inferences from Self-Selected Samples” ed. Howard Wainer.

I just wanted to be the first in this thread to remind everyone of Stigler’s Law http://en.wikipedia.org/wiki/Stigler's_law_of_eponymy

Economists (and indeed everyone else) use historical attributions in a variety of ways, often polemic or perverse. The Cambridge economist (that’s not the Cambridge in Massachusetts) Joan Robinson named something locally very famous as the “Ruth Cohen curiosum”. This naming is often now explained as a private or inside joke, although I have never seen an explanation of what was witty.

Somewhat unrelated. I read Jackman’s blog post linked here. He seemed to question value of pre registration for Bayesians. Two comments:

When making decisions even Bayesian adaptive trials perform analysis plans (Berry et al).

Also, it would seem Bayesians would need pretty good version control to avoid double counting evidence in a decentralized research environnent.

Congratulations! From

http://www.nytimes.com/2013/07/30/science/despite-two-new-studies-on-motives-for-monogamy-the-debate-continues.html

Dr. Opie offered possible explanations for why his team and Dr. Lukas’s came to different conclusions. It is possible that the forces driving the evolution of monogamy in primates are different than in other mammals. Dr. Opie also noted that he and his colleagues had used a more powerful type of statistics, known as Bayesian probability, to reconstruct the evolution of monogamy.

“They don’t use the latest methods, which is a bit of a pity,” Dr. Opie said.

And now Heckman. Your proselytizing has created a meme–“Bayesian good, frequentist bad” (only watch out that it doesn’t become “Bayesian good, frequentist better.”).

I saw that article as well. I’m as die hard baysian as they get and I took Dr. Opie’s comment to be a pretty slimy attempt to win a debate before there even was a debate. So I think you’re going to have a hard time pinning blame for Dr. Opie on Gelman, who’s quite a bit more moderate than me.

Numeric:

I am happy to see Bayesian methods being used by Heckman because Bayesian methods are, in my experience, a useful way of combining different sources of information and a useful way of handling multilevel structures of uncertainty. Also, I’ve seen non-Bayesians go into contortions to avoid using Bayesian methods.

But, the fact that I find Bayesian methods to be useful does not imply that I think all instances of Bayesian reasoning are correct. Many times I’ve seen Bayesian models that don’t make sense to me at all, and I’ve discussed this a lot in my research articles and here on this blog. So I think that what you call my “proselytizing” is pretty clear. And if someone thinks an analysis is “good” (or “better”) just because of how it’s labeled, that’s pretty silly. I suggest such people go back in a time machine and set it to Berkeley, California, circa 1995. Or maybe Ithaca, New York, circa 1950.

I’m not a labor economist exactly, but I have heard it said that the Roy model economists think about is actually this Borjas version:

http://www.jstor.org/discover/10.2307/1814529?uid=3739864&uid=2&uid=4&uid=3739256&sid=21102505910611

Andrew,

Heckman’s sourcing of “Roy model” is summarized in a footnote

in my book Causality (Chapter 3, footnote 10, page 98)

\footnote{A parallel framework was developed

in the econometrics literature under the rubric “switching regression”

(Manski 1995, p.\ 38), which Heckman (1996) attributed

to Roy (1951) and Quandt (1958); but lacking the formal semantics

of (3.51), did not progress much beyond a “framework.”}

Roy allow counterfactual quantity to enter the conversation

e.g., “the earning of an individual had he been hired”,

but never attached notation to the phrase “had he been hired”.

Neyman did. This, in my opinion is a key step that distinguishes a viable “model” from a

mere “framework” for thinking.

PS. The reluctance of leading researchers to acknowledge contributions

from neighboring fields is a universal phenomenon, not unique to economics,

so I am surprised that you are surprised.

Judea

Interesting post, and always an interesting blog. I have some thoughts and comments, knowing some of the participants in these controversies well, although it has been a couple of years since I have ran into Heckman.

My understanding of the history is as follows. The potential outcome framework became popular in the econometrics literature on causality around 1990. See Heckman (1990, American Economic Review, Papers and Proceedings, “Varieties of Selection Bias,” 313-318, and Manski (1990 American Economic Review, Papers and Proceedings, “Nonparametric Bounds on Treatment Effects,” 319-323.) Both those papers read very differently from the classic paper in the econometric literature on program evaluation and causality, published five years earlier, (Heckman, and Robb, 1985, “Alternative Methods for Evaluating the Impact of Interventions,” in Heckman and Singer (eds.), Longitudinal Analysis of Labor Market Data, Cambridge, Cambridge University Press) which did not use the potential outcome framework. When the potential outcome framework became popular, there was little credit given to Rubin’s work, but there were also no references to Neyman (1923), Roy (1951) or Quandt (1958) in the Heckman and Manski papers. It appears that at the time the notational shift was not viewed as sufficiently important to attribute to anyone.

Heckman’s later work has attempted to place the potential outcome framework in a historical perspective. Here are two quotes somewhat clarifying his views on the relation to Rubin’s work. In 1996 he wrote:

“The “Rubin Model” is a version of the widely used econometric switching regression model (Maddalla 1983; Quandt, 1958, 1972, 1988). The Rubin model shares many features in common with the Roy model (Heckman and Honore, 1990, Roy 1951) and the model of competing risks (Cox, 1962). It is a tribute to the value of the framework that it has been independently invented by different disciplines and subfields within statistics at different times.” p. 459

(Heckman, (1996) Comment on “identification of causal effects using instrumental variables”,

journal of the american statistical association.)

More recently, in 2008, he wrote:

“4.3 The Econometric Model vs. the Neyman-Rubin Model

Many statisticians and social scientists use a model of counterfactuals and causality attributed to Donald Rubin by Paul Holland (1986). The framework was developed in statistics by Neyman (1923), Cox (1958) and others. Parallel frameworks were independently developed in psychometrics (Thurstone, 1927) and economics (Haavelmo, 1943; Quandt, 1958, 1972; Roy, 1951). The statistical treatment effect literature originates in the statistical literature on the design of experiments. It draws on hypothetical experiments to define causality and thereby creates the impression in the minds of many of its users that random assignment is the most convincing way to identify causal models.” p. 19

(“Econometric Causality”, Heckman, International economic review, 2008, 1-27.)

(I include the last sentence of the quote mainly because it is an interesting thought, although it is not really germane to the current discussion.)

In the end I agree with Andrew’s blog post that the attribution to Roy or Quandt is tenuous, and I would caution the readers of this blog not to interpret Heckman’s views on this as reflecting a consensus in the economics profession. The Haavelmo reference is interesting. Haavelmo is certainly thinking of potential outcomes in his 1943 paper, and I view Haavelmo’s paper (and a related paper by Tinbergen) as the closest to a precursor of the Rubin Causal Model in economics. However, Haavelmo’s notation did not catch on, and soon econometricians wrote their models in terms of realized, not potential, outcomes, not returning to the explicit potential outcome notation till 1990.

Relatedly, I recently met Paul Holland at a conference, and I asked him about the reasons for attaching the label “Rubin Causal

Model” to the potential outcome framework in 1986. (now you often see phrase, “called the Rubin Causal Model by Paul Holland”)

Paul responded that he felt that Don’s work on this went so far beyond what was done before by, among others, Neyman (1923),

by putting the potential outcomes front and center in a discussion on causality, as in the 1974 paper, that his contributions

merited this label. Personally I agree with that.

A final comment on Judea Pearl’s comment about the “reluctance of leading researchers to acknowledge contributions from neigboring fields.” Judea Pearl may be right on this as a general matter, but the causality literature is actually one where there is a lot of cross-discipline referencing, and in fact a lot of cross-discipline collaborations between statisticians, econometricians, political scientists and computer scientists.

Guido,

Thanks for the extensive historical account on the origins of the

potential outcome framework. It coincides with mine, to which I would only

add an explanation why economists did not follow Haavelmo’s notation and,

instead, wrote their models in terms of realized, not potential, outcomes.

First, Haavelmo did not invent new notation. He computed the

the potential outcome by adding a correction term to a structural equation,

but did not assign a special notation to the quantity that results from this

correction.

Second, economists wrote their models in terms of realized, not potential, outcomes

because they did not need to; the counterfactual information (read: assumptions)

is already encapsulated in the structural equations (which were not available to

statisticians) so there was no need to invent

“non-realized” variables to carry those assumptions.

(I discussed it here: http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf

in the Section on “What kept the Cowles Commission at Bay?”)

Speaking of the history of potential outcome, does anyone know where the fundamental

equation of potential outcome first appears? I am referring to:

Y = (1-x)Y_0 + xY_1

which connects the realized (Y) with the potential outcomes (Y1 and Y0).

Heckman attributes it to Quandt (1958), but I could not find there anything

resembling the equation. I have been searching a lot for the source,

because I think it is perhaps even more pivotal than the notation Y0 and Y1 itself.

Judea

The parochial index (TM) is the product of two functions f(o) and f(c). f(o) measures the degree of overlap across social science disciplines in a given subject matter, say causal inference. f(c) the amount of cross disciplinary citations in that subject matter.

The P-index is (1- f(c))*f(o) appropriately scaled to [0,1]. It reaches a max when there is 100% overlap but 0% cross citations.

My hypothesis is the arrival of internet and Google Scholar is causing a reduction in parochialism. Implied by f(c) up for a given budget of citations.

Wow, such a high caliber of commenters.

As an Economics Ph.D. student in the late 1990s, I took the series in labor economics, and we read (I think) the 1990 Heckman and Honore paper about the Roy model. I found it all a bit dense and confusing. I was always interested in issues of identification in general, and I was a big fan Manski’s work. what I’d read of it anyway.

Then I came across the Angrist and Imbens papers in Econometrica and JASA about instrumental variables and local average treatment effects. Suddenly a lot of things were much clearer to me. I thought they were some of the best papers I’d ever read, and I wondered why we had not been introduced to those papers in our course work. (My advisor happened to be a student of Heckman’s. I didn’t know who Rubin was at the time.)

Then I read Heckman’s strangely cranky responses to Angrist and Imbens in JASA and Journal of Human Resources. I couldn’t understand what made him so annoyed. I suppose he was mad because he thought he had personally already understood all those things, so they weren’t new. Which might well be true, but he seemed to grossly undervalue the contribution of explaining clearly, with a clear notation. Since then I admit I’ve had pretty negative feelings about Heckman.

[…] Guido Imbens […]

This a mixed response to Pearl, Anonymous Ph.D. and perhaps extension to Guido.

I believe Andrew has identified a possible learning opportunity in this seeming _disagreement_ between Rubin and Heckman.

As Pearl points out, one could consider it just a “reluctance of leading researchers to acknowledge contributions from neighboring fields.” but that bypasses the _possible_ opportunity. Anonymous Ph.D.’s supposition – to me – maybe more what’s going on “he was mad because he thought he had personally already understood all those things” but these understandings may just be (subtly different).

An example – Rubin used to refuse to discuss the possibility (first raised by Stigler) that Fisher was not the first to use randomization. It came out that Rubin meant physically use it (not just as a mathematical construct). The location of an 1885 entry by CS Peirce on how to take a random sample immediately redressed this and Rubin published a _correction_.

The challenge sometimes is not to get in the middle of these misunderstandings or take sides?

Just one small comment to add; the intro to Heckman’s 2010 JEL article[1] spells out his views on the potential outcome & program evaluation literature in some detail. An extended quotation:

“The program evaluation approach replaces the traditional paradigm of economic policy evaluation with the paradigm of the randomized controlled trial. In place of economic models of counterfactuals, practitioners of this approach embrace a statistical model of experiments due to Jerzy Neyman (1923) and David R. Cox (1958) that was popularized by Donald B. Rubin (1974, 1978, 1986), and Paul W. Holland (1986). In this approach, the parameters of interest are defined as summaries of the outputs of experimental interventions. This is more than just a metaphorical usage. Rubin and Holland argue that causal effects are defined only if an experiment can be performed.

“This conflation of the separate tasks of defining causality and identifying causal parameters from data is a signature feature of the program evaluation approach. It is the consequence of the absence of clearly formulated economic models. The probability limits of estimators, and not the parameters of well-defined economic models, are often used to define causal effects or policy effects. The retreat to statistics in the program evaluation literature left a lot of economics behind. A big loss was the abandonment of economic choice theory. Important distinctions about ex ante and ex post outcomes and subjective and objective evaluations that are central to structural econometrics were forgotten.”

[1]: http://ideas.repec.org/a/aea/jeclit/v48y2010i2p356-98.html

This is a really good paper. Not only does it give a really clear picture of how economists think about the “Roy model” (regardless of who deserves attribution and whether or not you think the original Roy paper is being abused in the historiography), it is also a really great history of the popular methods of econometric evaluation – like a history of the thinking of econometric problems.

And then of course there is the point he is trying to make, which is a moving-beyond the structural/reduced-form divide by thinking very hard about the values of both methods and whether these values are actually mutually exclusive (spoiler alert: he thinks not).

I never know when I read Heckman whether I deeply agree with him or deeply disagree. For that matter, I feel similarly about our host. But in both cases I’m learning that the more time I spend listening, the more I realize precisely where I do and don’t agree with them, and I’m forced to enrich my thinking and/or extend it to new problems. So anyway, thanks for the link. Good paper. Maybe in 6 months I’ll realize that I actually hate it, but it will have pushed me 6 months forward.

Thanks gary.

I’ll speculate that Rubin sees “A big loss was the abandonment of economic choice theory” as a big gain, given a concern that people take their conceptual models too seriously – which is more dangerous when there are not going to be RCTs to rectify that …

[…] (2005, 2008), Heckman and Pinto (2013), or blog discussions such as on Pearl’s blog or Andrew Gelman’s blog (note comments from Pearl and from Guido Imbens). First suppose we define the random variable u as […]