Seven years ago I was contacted by Dawn Teele, who was then a graduate student and is now a professor of political science, and asked for my comments on an edited book she was preparing on social science experiments and their critics.
I responded as follows:
This is a great idea for a project. My impression was that Angus Deaton is in favor of observational rather than experimental analysis; is this not so? If you want someone technical, you could ask Ed Vytlacil; he’s at Yale, isn’t he? I think the strongest arguments in favor of observational rather than experimental data are:
(a) Realism in causal inference. Experiments–even natural experiments–are necessarily artificial, and there are problems in generalizing beyond them to the real world. This is a point that James Heckman has made.
(b) Realism in research practice. Experimental data are relatively rare, and in the meantime we have to learn with what data we have, which are typically observational. This is the point made by Paul Rosenbaum, Don Rubin, and others who love experiments, see experiments as the gold standard, but want to make the most of their observational data. You could perhaps get Paul Rosenbaum or Rajeev Dehejia to write a good paper making this point–not saying that obs data are better than experimental data, but saying that much that is useful can be learned from obs data.
(c) The “our brains can do causal inference, so why can’t social scientists?” argument. Sort of an analogy to the argument that the traveling salesman problem can’t be so hard as all that, given that thousands of traveling salesmen solve the problem every day. The idea is that humans do (model-based) everyday causal inference all the time (every day, as it were), and we rarely use experimental data, certainly not the double-blind stuff we do all the time. I have some sympathy but some skepticism with this argument (see attached article), but if you wanted someone who could make that argument, you could ask Niall Bolger or David Kenny or some other social psychologist or sociologist who is familiar with path analysis. Again, I doubt they’d say that observational data are better than the equivalent experiment, but they might point out that, realistically, “the equivalent experiment” isn’t always out there, and the observational data are.
(d) This issue also arises in evidence-based medicine. As far as I can tell, there are three main strands of evidence-based medicine: (i) using randomized controlled trials to compare treatments, (ii) data-based cost-benefit analyses (Qalys and the like), (iii) systematic collection and analysis of what’s actually done (i.e., observational data), thus moving medicine into a total quality control environment. You could perhaps get someone like Chris Schmid (a statistician at New England Medical Center who’s a big name in this field) to write an article about this (giving him my sentence above to give you a sense of what you’re looking for).
(e) An argument from a completely different direction is that _experimentation_ is great, but formal _randomized trials_ are overrated. The idea is that these formal experiments (in the style of NIH or, more recently, the MIT poverty lab) would be fine in and of themselves except that they (i) suck up resources and, even more importantly, (ii) dissuade people from doing everyday experimentation that they might learn from. The #1 proponent of this view is Seth Roberts, an experimental psychologist who’s written on self-experimentation.
I’d be happy to write something expanding (briefly) on the above points. I don’t feel so competent in the area to actually take any strong positions but I’d be glad to lay out what I consider are some important issues that often get lost in the debate.
A few months later I sent in my chapter, which begins:
As a statistician, I was trained to think of randomized experimentation as representing the gold standard of knowledge in the social sciences, and, despite having seen occasional arguments to the contrary, I still hold that view, expressed pithily by Box, Hunter, and Hunter (1978) that “To find out what happens when you change something, it is necessary to change it.”
At the same time, in my capacity as a social scientist, I’ve published many applied research papers, almost none of which have used experimental data.
In the present article, I’ll address the following questions:
1. Why do I agree with the consensus characterization of randomized experimentation as a gold standard?
2. Given point 1 above, why does almost all my research use observational data?
In confronting these issues, we must consider some general issues in the strategy of social science research. We also take from the psychology methods literature a more nuanced perspective that considers several different aspects of research design and goes beyond the simple division into randomized experiments, observational studies, and formal theory.
A few years later the book came out.
I’ve blogged on this all before, but just recently the journal Perspectives on Politics published a symposium with several reviews of the book (from Henry Brady, Yanna Krupnikov, Jessica Robinson Preece, Peregrine Schwartz-Shea, and Betsy Sinclair), and I thought it might interest some of you.
In her review, Sinclair writes, “The arguments in the book are slightly dated . . . Seven years later, there is more consensus within the experimental community about the role experiments play in addressing a research question.” I don’t quite agree with that; I think the issues under discussion remain highly relevant. I hope that soon we shall reach a point of consensus, but we’re not there yet.
I certainly would not want to join in any consensus that includes some of the more controversial Ted-talk-style experimental claims involving all the supposedly irrational influences on voting, for example. The key role of experimentation in such work is, I think, not scientific so much as meta-scientific: when a study is encased in an experimental or quasi-experimental framework, it can seem more like science, and then the people at PPNAS, NPR, etc., can take it more seriously. My recommendation is for experimentation, quasi-experimentation, and identification strategies more generally to be subsumed within larger issues of statistical measurement.