I just read the new book, “Mostly Harmless Econometrics: An Empiricist’s Companion,” by Joshua Angrist and Jorn-Steffen Pischke. It’s an excellent book and, I think, well worth your $35. I recommend that all of you buy it.
I also have a few comments.
The book is focused, which has got to be a good thing: it’s only 300 pages, and the pages themselves are pretty small. It’s written in a conversational style (except for the theorems; more on that below) and is pleasant to read and also informative.
Nothing on model building!
The book’s perspective is as follows: you want to do a causal inference, you already have an outcome measure and a treatment indicator–in their examples, the outcome is almost always continuous and the treatment is almost always binary–and you also probably have some pre-treatment measures. And then you run a regression. Angrist and Pischke explain how the direct regression works, and then they discuss various methods such as instrumental variables and discontinuity analysis can help you make use of any quasi-experimental structure in your data.
My main criticism of the book is that, in keeping their sharp focus, the authors spend almost no time discussing model building. They discuss the idea of E(y|x) right away, but they seem to imply that the model is pre-specified, or that the researcher simply takes all available variables and does nothing to them but throw them in. Realistically, there can be a lot of transformation and combination of variables.
In particular, the book does not mention interactions at all (except in the special case of their use to create a saturated model with discrete predictors), a point to which I shall return below.
From my perspective, these omissions is not a problem, since Jennifer and I discuss model building in detail in chapters 3, 4, 5, and 6 of our book. But I’m a little worried that students and researchers in economics might be misled by Angrist and Pischke’s conversational yet authoritative tone into thinking that regression models come out of nowhere.
Also here are some things that are not in the book, or, at least, words that are entirely missing from the index:
Multilevel (or Hierarchical)
Again, this is no criticism–but I would’ve liked to have seen a few sentences near the beginning or the end of the book bounding their topics, so that students would have a better sense of what else is out there.
To defend Angrist and Pischke here, I might say that statisticians such as myself are all too concerned about modeling the data, enough so that they (we) shortchange the ultimately more important goal of causal inference. Angrist and Pischke are keeping the focus where it counts.
The Angrist and Pischke book is much closer to my sensibility than something like Wooldridge (for example, using Wooldridge as an example because it it’s of very high quality but far from my perspective). And in many ways there’s something refreshing to me about the economists’ (and econometricians’) focus on estimating a single “beta” using all means necessary. On the other hand, I think there’s a big big gap in practice when there’s zero discussion of how to set up the model, an implicit assumption that variables are just dumped in raw to the regression.
Maybe these econ students could read a statistics text (that is, a book more focused on model building) as a supplement?
In keeping with the econometric and statistical literature on causal inference, Angrist and Pischke spend a bit of time discussing concepts such as the local average treatment effect. The idea is that in any experiment or observational study, the inference about the treatment effect only applies to the people who could have had the treatment or the control done to them, with various designs and estimation strategies corresponding to different estimands (the effect of the treatment on the treated, and so forth).
This stuff is important, but it always seems funny for me for it to be considered in isolation of the model being fitted. In particular, were the treatment effect truly constant across all units, we could just speak of “the treatment effect” without having to specify which cases we’re averaging over. Any discussion of particular average treatment effects is relevant because treatment effects vary; that is, the treatment interacts with pre-treatment variables.
Given that, I think it can be important to model such interactions. This is done in econometrics (see, for example, the work of Dehejia), so I’m not proposing any revolutionary idea here. What I’m saying is that in a 300-page book on econometrics, where there is much discussion of average treatment effects, I’d like to see some discussion and examples of models with treatment interactions.
Reading this enjoyable book provoked various thoughts:
Psychological experimentation: The authors discuss Milgram’s famous experiment on obedience to authority but then suggest that it would have been “better left on the drawing board”? Why do they say this–because somebody said that the participants of that study might have been upset by it? My impression is that the Milgram experiment was a great contribution and that, as a society, we’re better off that it was done. I can see that some people might be skeptical about Milgram’s result, but as an empirical researcher, I appreciate those people like Milgram who go to the trouble to get the data that the rest of us analyze.
Thinking in terms of interventions: In introducing the foundations of experimentation and causal inference, I wish Angrist and Pischke would discuss potential interventions as a way to understand causality. For example, they give an example of a “fundamentally unidentified question” that I would actually call a fundamentally _undefined_ question. In their example, why not just consider potential interventions such as sending a given kid to first grade at age 5, 6, or 7? The causal inference all flows from this.
Writing style: Throughout the authors use too many abbreviations for my taste. It’s a well-written book but the constant flow of acronyms disrupt the flow. One or two abbreviations (for example, OLS and IV) are ok, but it gets out of control when they start with the more obscure acronyms such as HIE, CEF, LDV, CIA, OVB, MD, CQF, DD, LIML, ACR, and so forth.
Presentation of results: I like their use of graphs. I would’ve liked some scatterplots of raw data but I know that’s not economists’ style. One of the few places where I noticed them slipping up was on page 130 where they forgot to round their digits and printed numbers such as “16,451″ and “-435.8″ and so forth. Also their tables on pages 230-231 should’ve been a graph. Overall, though, the presentation is excellent.
Mathematical style: As a statistician, I trace my descent from R. A. Fisher, who wrote books (notably Statistical Methods for Research Workers) with methods and examples and discussion but no theorems. Statistics books such as Fisher’s (and, I hope, mine) have a logical flow but it’s not quite the theorem/proof style. In contrast, Angrist and Pischke, despite their conversational style, here and there slip in pages of theorems and formulas. It doesn’t make a lot of sense to me, but there you go. It’s how econometricians communicate.
Matching and regression: The authors provide a useful discussion of matching and regression and the essential unity of these methods. In a second edition, I recommend that they more clearly distinguish between two different (but related) goals of matching: balance and overlap (see chapter 10 of Gelman and Hill for some graphs that illustrate these concepts). Also they should make more clear the two-step process: do matching to get comparable groups, then do regression for further adjustment and also for modeling interactions. A casual reader of the book might be left with the unfortunate impression that matching is a competitor to regression rather than a tool for making regression more effective.
Weighting: When discussing weighted regression, it would’ve been good if Angrist and Pischke had pointed out that, if you include as regression predictors the variables that affect the treatment assignment, that there’s no need to weight for them in the regression. Weighting is intended to correct for variables that have not been included in the model. Making this point would’ve unified the book’s presentation of weighting, instead of presenting it more as a matter of taste as they do here.
The book is wonderful and you should buy it. My comments above are not criticisms of the book but rather are intended to delineate what it does and does not cover. I recommend that everybody who has my book with Jennifer Hill read the Angrist and Pischke book (and, of course, I recommend the converse as well).