The other day, in the context of a discussion of an article from 1972, I remarked that the great statistician William Cochran, when writing on observational studies, wrote almost nothing about causality, nor did he mention selection or meta-analysis.

It was interesting that these topics, which are central to any modern discussion of observational studies, were not considered important by a leader in the field, and this suggests that our thinking has changed since 1972.

Today I’d like to make a similar argument, this time regarding the topic of *measurement*. This time I’ll consider Donald Rubin’s 2008 article, “For objective causal inference, design trumps analysis.”

All of Rubin’s article is worth reading—it’s all about the ways in which we can structure the design of observational studies to make inferences more believable—and the general point is important and, I think, underrated.

When people do experiments, they think about *design*, but when they do observational studies, they think about *identification strategies*, which is related to design but is different in that it’s all about finding and analyzing data and checking assumptions, not so much about about systematic data collection. So Rubin makes valuable points in his article.

But today I want to focus on something that Rubin doesn’t really mention in his article: *measurement*, which is a topic we’ve been talking a lot about here lately.

Rubin talks about randomization, or the approximate equivalent in observational studies (the “assignment mechanism”), and about sample size (“traditional power calculations,” as his article was written before Type S and Type M errors were well known), and about the information available to the decision makers, and about balance between treatment and control groups.

Rubin does briefly mention the importance of measurement, but only in the context of being able to match or adjust for pre-treatment differences between treatment and control groups.

That’s fine, but here I’m concerned with something even more basic: the *validity* and *reliability* of the measurements of outcomes and treatments (or, more generally, comparison groups). I’m assuming Rubin was taking validity for granted—assuming that the x and y variables being measured were the treatment and outcome of interest—and, in a sense, the reliability question is included in the question about sample size. In practice, though, studies are often using sloppy measurements (days of peak fertility, fat arms, beauty, etc.), and if the measurements are bad enough, the problems go beyond sample size, partly because in such studies the sample size would have to creep into the zillions for anything to be detectable, and partly because the biases in measurement can easily be larger than the effects being studied.

So, I’d just like to take Rubin’s excellent article and append a brief discussion of the importance of measurement.

**P.S.** I sent the above to Rubin, who replied:

In that article I was focusing on the design of observational studies, which I thought had been badly neglected by everyone in past years, including Cochran and me. Issues of good measurement, I think I did mention briefly (I’ll have to check—I do in my lectures, but maybe I skipped that point here), but having good measurements had been discussed by Cochran in his 1965 JRSS paper, so were an already emphasized point.

And I wanted to focus on the neglected point, not all relevant points for observational studies.