Skip to content

Continuing puzzlement over “Why” questions

Tyler Cowen links to a blog by Paul Kedrosky that asks why winning times in the Boston marathon have been more variable, in recent years, than winning times in New York. This particular question isn’t so interesting–when I saw the title of the post, my first thought was “the weather,” and, in fact, that and “the wind” are the most common responses of the blog commenters–but it reminded me of a more general question that we discussed the other day, which is how to think about Why questions.

Many years ago, Don Rubin convinced me that it’s a lot easier to think about “effects of causes” than “causes of effects.” For example, why did my cat die? Because she ran into the street, because a car was going too fast, because the driver wasn’t paying attention, because a bird distracted the cat, because the rain stopped so the cat went outside, etc. When you look at it this way, the question of “why” is pretty meaningless.

Similarly, if you ask a question such as, What caused World War 1, the best sort of answers can take the form of potential-outcomes analyses. I don’t think it makes sense to expect any sort of true causal answer here.

But, now let’s get back to the “volatility of the Boston marathon” problem. Unlike the question of “why did my cat die” or “why did World War 1 start,” the question, “Why have the winning times in the Boston marathon been so variable” does seem answerable.

What happens if we try to apply some statistical principles here?

Principle #1: Compared to what? We can’t try to answer “why” without knowing what we are comparing to. This principle seems to work in the marathon-times example. The only way to talk about the Boston times as being unexpectedly variable is to know what “expectedly variable” is. Or, conversely, the New York times are unexpectedly stable compared to what was happening in Boston those same years. Either way, the principle holds that we are comparing to some model or another.

Principle #2: Look at effects of causes, rather than causes of effects. This principle seems to break down in marathon example, where it seems very natural to try to understand why an observed phenomenon is occurring.

What’s going on? Perhaps we can understand in the context of another example, something that came up a couple years ago in some of my consulting work. The New York City Department of Health had a survey of rodent infestation, and they found that African Americans and Latinos were more likely than whites to have rodents in their apartments. This difference persisted (albeit at a lesser magnitude) after controlling for some individual and neighborhood-level predictors. Why does this gap remain? What other average differences are there among the dwellings of different ethnic groups?

OK, so now maybe we’re getting somewhere. The question on deck now is, how do the “Boston vs. NY marathon” and “too many rodents” problems differ from the “dead cat” problem.

One difference is that we have data on lots of marathons and lots of rodents in apartments, but only one dead cat. But that doesn’t quite work as a demarcation criterion (sorry, forgive me for working under the influence of Popper): even if there were only one running of each marathon, we could still quite reasonably answer questions such as, “Why was the winning time so much lower in NY than in Boston?” And, conversely, if we had lots of dead cats, we could start asking questions about attributable risks, but it still wouldn’t quite make sense to ask why the cats are dying.

Another difference is that the marathon question and the roach question are comparisons (NY vs. Boston and blacks/hispanics vs. whites), while the dead cat stands alone (or swings alone, I guess I should say). Maybe this is closer to the demarcation we’re looking for, the idea being that a “cause” (in this sense) is something that takes you away from some default model. In these examples, it’s a model of zero differences between groups, but more generally it could be any model that gives predictions for data.

In this model-checking sense, the search for a cause is motivated by an itch–a disagreement with a default model–which has to be scratched and scratched until the discomfort goes away, by constructing a model that fits the data. Said model can then be interpreted causally in a Rubin-like, “effects of causes,” forward-thinking way.

Is this the resolution I’m seeking? I’m not sure. But I need to figure this out, because I’m planning on basing my new intro stat course (and book) on the idea of statistics as comparisons.

P.S. I remain completely uninterested in questions such as, What is the cause? Is it A or is it B? (For example, what caused the differences in marathon-time variations in Boston and New York–is it the temperature, the precipitation, the wind, or something else? Of course if it can be any of these factors, it can be all of them. I remain firm in my belief that any statistical method that claims to distinguish between hypotheses in this way is really just using sampling variation as a way to draw artificial distinctions, fundamentally in a way no different from the notorious comparisons of statistical significance to non-significance.

This last point has nothing to do with causal inference and everything to do with my preference for continuous over discrete models in applications in which I’ve worked in social science, environmental science, and public health.


  1. Manoel Galdino says:

    Great text.

    But, just to clarify, I would like to ask you about the slide of your talk at London School about Red States, Blue States…

    It seems to me that according to you, polarization was caused by parties, not voters. So, it seems that you thought of cause of effect (parties caused polarization). Besides, is seems that this is in the format you criticized in your P.s.: what is the cause of polarization: voters, parties, whatever?

    Thanks in advance,


  2. RogerH says:

    Thanks for an interesting post.

    "…some default model. In these examples, it's a model of zero differences between groups."

    Hmm, sounds very like a null hypothesis to me.

    You can, of course, make use of such frequentist concepts while entirely avoiding (i nearly said 'rejecting') "notorious comparisons of statistical significance to non-significance".

  3. Andrew Gelman says:

    Roger: I have nothing against null hypotheses. See, for example, chapter 2 of ARM or chapter 6 of BDA. The key, though, is to remember that the goal is not to try to "reject" or disprove the null hypothesis. The null hypothesis is always false. The point of the statistical check is to understand the ways in which the data are inconsistent with the null hypothesis.

  4. jonathan says:

    Are you speaking of why as a multi-level model in which the truth of result is encapsulated in a manner that may be true or false in a large encapsulation? Contextual modeling, containers, sets, etc. It certainly isn't the mechanistic null hypothesis disproving which makes why.

  5. RogerH says:

    Sounds like we're in complete agreement about that then. The phrases "statistically significant" and "reject the null hypothesis" are effectively banned in my dept. It's a shame we still have to spend time persuading the first-year undergrads who've learnt some stats at school to stop using them. In fact it's often easier to teach the ones who've done no stats at all previously.

    I just thought you seemed to be avoiding the phrase "null hypothesis" in your original post, but i'm happy to accept your comment to the contrary. I've been following your blog for over a year now (though i skip most of the posts about US political science) but i admit i've not read your books.

  6. noumignon says:

    Is this a good example of an "effects of causes" question? "How much of my cat's death was attributable to speeding drivers, and how much to his not wearing a leash?" Since both those things cause cats to get run over?

  7. Andrew Gelman says:

    Cats don't wear leashes. That's crazy talk.

  8. Bruce McCullough says:


    You write:

    "I remain firm in my belief that any statistical method that claims to distinguish between hypotheses in this way is really just using sampling variation as a way to draw artificial distinctions,"

    Do you mean that this is "probably true" or "always true"?

    If the former, I can see it. If the latter, I can't.



  9. noumignon says:

    What I meant to say was that you used the "effects of causes" and the cat example in this post and another one two weeks ago and I don't get it. You explain what "causes of effects" thinking looks like and why it's not helpful, but not what it means to do it the other way around.

  10. Andrew Gelman says:

    "Effects of causes" is statistics/econometrics jargon for thinking about potential outcomes under different possible treatments: instead of asking, "What causes Y?" or "Does X cause Y?", you ask, "What are the effects of X?" It's generally agreed that "What are the effects of X?" is a cleaner question than "What causes Y?" But sometimes we want to know what causes Y.

    For more details, see chapters 9 and 10 of my book with Hill.

  11. Wayne Folta says:

    Two thoughts:

    Could you change your cat question into: "Why does my cat have an excess of deaths compared to other cats in the neighborhood?" Or perhaps, "Why does my cat, in hour H_d, have an excess of deaths compared to my cat at hours H_n (n<d)?"

    Also, I think the question "Why did my cat die?" is shortened from: "What factors made my cat more likely to die at hour H_d rather than in the preceding H_n?" That restatement feels like it begins to reveal what makes you uneasy, in particular the "more likely" part.

  12. Tom Moertel says:

    What makes one "why" question interesting and another boring is whether we have reason to believe that investigating the question will lead to the discovery of new causal mechanisms, particularly those mechanisms that will remain in effect when the system that contains them is perturbed. In the case of your cat example, we don't particularly care why the cat died; learning that it was because of the cat, the car, the bird, or some combination of them doesn't suggest much new to us about the way the universe works.

    Further, even if it did, we wouldn't expect to have much use for that knowledge. Under the great probability distribution that describes the possible states of the universe that we care about, the sum of those probabilities corresponding to states where we could expect to apply what we have learned about the cause of the cat's death is, well, small — not only because we're less interested in cats but also because we suspect that the states in which we could apply that knowledge to do something useful are rare.

    Thus, what makes a "why" question interesting is its potential to help us discover mechanisms that work in lots of states, especially those states we find most important and most likely to find ourselves in.