Here’s some material on causal inference from a regression perspective. It’s from our recent book, and I hope you find it useful.

Chapter 9: Causal inference using regression on the treatment variable

Chapter 10: Causal inference using more advanced models

Chapter 23: Causal inference using multilevel models

Here are some pretty pictures, from the low-birth-weight example:

and from the Electric Company example:

My favorite examples:

The more firemen show up at a fire, the more damage is done

Students with tutors get worse grades than students without tutors and….

There is a relationship between astrological sign and IQ, but this relationship is strongest in children in K and 1st grade, and weakens with age. Also, it's different in different states and different countries

Jean-Luc,

As a former physics student I remain interested in these sorts of things. I can't comment on Shalizi's quantum mechanical arguments, beyond noting that the statistical framework of causal inference, and also standard ("Kolmogorov") probability theory itself, seems to fall apart in quantum settings such as the two-slit experiment:

All the statistical models I've ever seen (excepting those models specifically used for quantum mechanics) assume "Kolmogorov" (or, in physics lingo, "Boltzmann") probability, in which there's a joint distribution over some space, events are divided into a mutually exclusive and exhaustive set, each event is given a probability between 0 and 1, and these probabilities sum to 1. But this model does not fit reality, at least at the quantum level. In the two-slit experiment, you just can't assign a joint probability distribution to the events corresponding to "which slit the photon went through" and "where on the screen the photon landed." It's Heisenberg's uncertainty principle: if you measure the slit, it alters the distribution of where the photon landed. You could incorporate the measurement process in the model, but you still lose the idea of an underlying joint distribution, which is the basis of statistical modeling.

I also have a minor comment, which is that I use Rubin's term "potential outcome" in preference to "counterfactual" because the method can be applied to an experiment that has not been conducted yet, in which case none of the potential outcomes are "counterfactual" yet.

Also, I think the potential-outcome framework is helpful, but it's not generally the whole story. In practice, I'm almost always interested in causal inference for cases outside of the sample. For example, if I do a study on 100 people and 50 get the treatment and 50 get the control, I don't really care so much about these 100 except inasmuch as they represent a larger population of interest.

Finally, see here for more thoughts on causality, decision analysis, and quantum mechanics.