Josh Miller (of Miller & Sanjurjo) writes:
On correlations, you know, the original Gilovich, Vallone, and Tversky paper found that the Cornell players’ “predictions” of their teammates’ shots correlated 0.04, on average. No evidence they can see the hot hand, right?
Here is an easy correlation question: suppose Bob shoots with probability ph=.55 when he is hot and pn=.45 when he is not hot. Suppose Lisa can perfectly detect when he is hot, and when he is not. If Lisa predicts based on her perfect ability to detect when Bob is hot, what correlation would you expect?
With that setup, I could only assume the correlation would be low.
I did the simulation:
> n <- 10000 > bob_probability <- rep(c(.55,.45),c(.13,.87)*n) > lisa_guess <- round(bob_probability) > bob_outcome <- rbinom(n,1,bob_probability) > cor(lisa_guess, bob_outcome)  0.06
Of course, in this case I didn’t even need to compute lisa_guess as it’s 100% correlated with bob_probability.
This is a great story, somewhat reminiscent of the famous R-squared = .01 example.
P.S. This happens to be closely related to the measurement error/attenuation bias issues that Miller told me about a couple years ago. And Jordan Ellenberg in comments points to a paper from Kevin Korb and Michael Stillwell, apparently from 2002, entitled “The Story of The Hot Hand: Powerful Myth or Powerless Critique,” that discusses related issues in more detail.
The point is counterintuitive (or, at least, counter to the intuitions of Gilovich, Vallone, Tversky, and a few zillion other people, including me before Josh Miller stepped into my office that day a couple years ago) and yet so simple to demonstrate. That’s cool.
Just to be clear, right here my point is not the small-sample bias of the lagged hot-hand estimate (the now-familar point that there can be a real hot hand but it could appear as zero using GIlovich et al.’s procedure) but rather the attenuation of the estimate: the less-familiar point that even a large hot hand effect will show up as something tiny when estimated using 0/1 data. As Korb and Stillwell put it, “binomial data are relatively impoverished.”
This finding (which is mathematically obvious, once you see it, and can demonstrated in 5 lines of code) is related to other obvious-but-not-so-well-known examples of discrete data being inherently noisy. One example is the R-squared=.01 problem linked to at the end of the above post, and yet another is the beauty-and-sex-ratio problem, where a researcher published paper after paper of what was essentially pure noise, in part because he did not seem to realize how little information was contained in binary data.
Again, none of this was a secret. The problem was sitting in open sight, and people have been writing about this statistical power issue forever. Here, for example, is a footnote from one of Miller and Sanjurjo’s papers:
Funny how it took this long for it to become common knowledge. Almost.
P.P.S. I just noticed another quote from Korb and Stillwell (2002):
Kahneman and Tversky themselves, the intellectual progenitors of the Hot Hand study, denounced the neglect of power in null hypothesis significance testing, as a manifestation of a superstitious belief in the “Law of Small Numbers”. Notwithstanding all of that, Gilovich et al. base their conclusion that the hot hand phenomenon is illusory squarely upon a battery of significance tests, having conducted no power analysis whatsoever! This is perhaps the ultimate illustration of the intellectual grip of the significance test over the practice of experimental psychology.
I agree with the general sense of this rant, but I’d add that, at least informally, I think Gilovich et al., and their followers, came to their conclusion not just based on non-rejection of significance tests but also based on the low value of their point estimates. Hence the relevance of the issue discussed in my post above, regarding attenuation of estimates. It’s not just that Gilovich et al. found no statistically significant differences, it’s also that their estimates were biased in a negative direction (that was the key point of Miller and Sanjurjo) and pulled toward zero (the point being made above). Put all that together and it looked to Gilovich et al. like strong evidence for a null, or essentially null, effect.