The above title is my response to a discussion that began with this email sent to be by Steve Roth:
Noah Smith had a great tweet recently, a real keeper for me [Roth].
Causation is correlated with correlation.
I would reword it:
Correlation correlates with causation. (Just not very much.)
And I wonder if the following corollaries are safe:
Non-correlation correlates (more strongly) with non-causation.
Negative correlation correlates (much more strongly) with non-causation.
This in response to the old nostrum/saw that correlation does not imply causation.
Which has always seemed wrong to me. Of course it does! (Weakly.)
The problem is that “imply” is a very slippery word, so it’s a pretty useless nostrum.
Would be delighted to see a post poking at this.
I will post something on this (at some point; we’re on a 1-2 month delay so most things don’t appear right away) but my quick response is: Selection bias. If people start sending you random pairs of variables that happen to be highly correlated, sure, there might well be a connection between them, for example kids’ scores on math tests and language tests are correlated, and this tells us something. But if someone is looking for a particular pattern, and then selects two variables that are correlated, that’s another story. The great thing about causal identification is that it’s valid even if you’re looking to find a pattern. (Not completely, there’s p-hacking and also you can run 100 experiments and only report the best one, etc., but that’s still less of an issue than the fact that pure correlation does not logically tell you anything about causation. To put it another way: returning to Noah’s tweet: Correlation is surely correlated with causation in an aggregate sense, but if you take the subset of correlations that a particular motivated researcher is looking for—then maybe not.
You could also see the above paragraph as a bit of common-sense reasoning. The expression “correlation does not imply causation” is popular, and I think it’s popular for a reason, that it does capture a truth about the world.
I cc-ed Smith on this exchange and also Dan Kahan, who wrote:
For what it’s worth, my two variants would be:
1. Nothing other than correlation implies causation.
2. Correlation implies causation — except when it doesn’t.
Credit to D. Hume for #1 (at least for noticing that there’s no other visible indicator of causation).
#2 is just what Andrew said: causation = correlation plus valid causal inference.
Again, the elephant in the room here is selection. People see enough random correlations that they can pick them out and interpret them how they like.
So if I had to put something on a bumper sticker (or a tweet), it would be:
Correlation does not even imply correlation
That is, correlation in the data you happen to have (even if it happens to be “statistically significant”) does not necessarily imply correlation in the population of interest.
P.S. I’ve shifted the emphasis in my slogan to make the point clearer.