Skip to content

The usual model for before-after data is wrong

A common design in an experiment or observational study is to have two groups–treated and control units–and to take “before” and “after” measurements on each unit. (The basis of any experimental or observational study is to compare treated to control units; for example, there might be improvement from before to after whether or not a treatment was applied.)

The usual model

The usual statistical model for such data is a regression of “after” on “before” with parallel lines for treatment and control groups, with the difference between the lines representing the treatment effect. The implication is that the treatment has a constant effect, with the only difference between the two groups being an additive shift.

We went back and looked at some before-after data that we had kicking around (two observational studies from political science and an educational experiment) and found that this standard model was not true–in fact, treatment and control groups looked systematically different, in consistent ways.


In the examples we studied, the correlation between “before” and “after” measurements was higher in the control group than in the treatment group. When you think about this, it makes sense: applying the “treatment” induces changes in the units, and so it is reasonable to expect a lower correlation with the “before” measurement.

Another way of saying this is: if the treatment effect varies (instead of being a constant), it will reduce the before-after correlation. So our finding can be interpreted as evidence that treatment effects generally vary, which of course makes sense.

In fact, the only settings we found where the controls did not have a higher before-after correlation than treated units, were when treatment effects were essentially zero.