Luis Guirola writes:
I’m a poli sci student currently working on methods. I’ve seen you sometimes address questions in your blog, so here is one in case you wanted.
I recently read some of Chuck Manski book “Identification for decision and prediction”. I take his main message to be “The only way to get identification is using assumptions which are untestable”. This makes a lot of sense to me. In fact, most literature in causal applied design working in the Rubin identification tradition that is now popular in poli sci proceeds that way: first consider a research design (IV, quasi experiment, RDD, whatever) and a) justify that the conditions for identification are met and b) proceed to run the design conditional on the assumptions being true. My problem here is that the decision about a) is totally binary, and the uncertainty that I feel is associated with it is taken out of the final result.
Chuck Manski’s idea here is something like “let’s see how far we can get without making any assumptions” (or as few as possible), which takes him to set identification. But as someone educated in the bayesian tradition, I tend to feel that there must be a way of quantifying, if only subjectively or a priori, how sure we are about how sensible the identification assumptions by putting a probability distribution on them. Intuitively, that’s how I assess the state of knowledge in a certain area: if it relies on strong/implausible identification assumptions, I give less credit to its results; if I feel the assumptions are generalizable and hard to dispute, I give them more credit. But obviously, this is a very sloppy way of assesing it… I feel I must be missing something here, for otherwise I should have found more stuff on this.
Yes, I think it is would be a good idea to quantify uncertainty in identification assumptions. The basic idea would be to express your model with an additional parameter, call it phi, which equals 0 of the identification assumption holds, and is positive or negative if the assumption fails, with the magnitude of phi indexing how far off the assumption is from reality. For example, if you have a model of ignorable treatment assignment, phi could be the coefficient on an unobserved latent characteristic U in a logistic regression predicting treatment assignment; for example, Pr(T=1) = invlogit(X*beta + U*phi), where X represents observed pre-treatment predictors. The coefficient phi could never actually be estimated from data, as you don’t know U, but one could put priors on X and U based on some model of how selection could occur. One could then look at sensitivity of inferences to assumed values of phi.
I’m sure a lot of work has been done on such models—I assume they’re related to the selection models of James Heckman from the 1970s—and I think they’re worthy of more attention. My impression is that people don’t work with such models because they make life more complicated and require additional assumptions.
It’s funny: Putting a model on U and a prior on phi is a lot less restrictive—a lot less of an “assumption”—than simply setting phi to 0, which is what we always do. But the model on U and phi is explicit, whereas the phi=0 assumption is hidden so somehow it doesn’t seem so bad.
Regression models with latent variables and measurement error can be difficult to fit using usual statistical software but they’re easy to fit in Stan: you just add each new equation and distribution to the model, no problem at all. So I’m hoping that, now that Stan is widely available, people will start fitting these sorts of models. And maybe at some point this will be routine for causal inference.
At the time of this writing, I haven’t worked through any such example myself, but I think it’s potentially a very useful idea in many application areas.