X points me to this news article by George Johnson regarding the hot hand in basketball. Nothing new since the previous hot hand report (also Johnson follows the usual newspaper convention of not citing the earlier article in the Wall Street Journal, instead simply linking back to the Miller and Sanjurjo article as if it had not been reported before), but he did get in an interview with Thomas Gilovich, one of the authors of the original hot hand paper from thirty years ago:

Dr. Gilovich is withholding judgment. “The larger the sample of data for a given player, the less of an issue this is,” he wrote in an email. “Because our samples were fairly large, I don’t believe this changes the original conclusions about the hot hand.”

It’s really too bad to hear Gilovich write this. Isn’t it always the way, though: people don’t like to admit that they made a mistake, even an honest mistake.

Anyway, I think what happens is that if you reanalyze the Gilovich et al. data carefully you will find some evidence for a hot hand. The difficulty is not that their samples are so large but that they are so small. The hot hand effect is subtle to detect, and binary data are weak, so you need lots of data to estimate it.

Here’s a quick calculation, just to give you an idea.

Suppose Pr(success following a miss) is 0.5 and Pr(success following a success) is 0.55. I choose this an effect size that would be nontrivial but still could be difficult to detect.

Now suppose you have data from 1000 shots. 1000 is a lot, right? And let’s for a moment forget about the now well-known measurement bias issue.

So you have something like 500 successes and 500 misses, and you can take the difference in success rates and estimate Pr(success following a success) – Pr(success following a miss). The standard error of your estimate is sqrt(.5^2/500 + .5^2/500) = 0.03. And you can’t reliably detect a 5 percentage point difference if your measuring instrument has a standard error of 0.03.

And then you bring in the issue of the bias that Miller and Sanjurjo discovered, which depends not on your total sample size but on your sample size per player. You can see that even a seemingly small bias such as 0.02 can completely destroy any hope of discovering anything here. And this doesn’t even get into the issue that just looking at the previous shot tells only part of the story. In the case of Gilovich et al.’s data, if you just look at certain comparisons and don’t adjust for the bias, you won’t see any evidence for the hot hand—that’s what happened in that 1985 paper, as Miller and Sanjurjo explained.

Now let’s bring it back to Gilovich’s statement that this doesn’t change their original conclusions. I think he’s half correct on this—or maybe more than half correct.

My reasoning goes as follows. Gilovich et al. reported three things in their paper. First, that there’s no evidence for any hot hand in basketball shooting. I think they were wrong on this one; it does seem that, if you look at basketball data carefully, you do see evidence for a hot hand, it’s just that the original analyses were hampered by bias and variance issues. Second, Gilovich et al. report that basketball fans view the hot hand effect as huge, much larger than any such effect in reality. I find their results convincing on that point. It does seem that once a person focuses on any particular effect, once he or she believes it to be nonzero, there’s a tendency to overrate its importance. I guess that’s related to the “availability heuristic” studied by Amos Tversky, another author of that hot hand paper. Gilovich et al.’s third finding is that people perceive a hot hand even if you give them completely random sequences. That appears to be true too, even if not newsworthy on its own.

So we can say that two out of the three findings of Gilovich et al. (1985) remain valid at some level. Not bad for a 30-year-old paper and no cause for embarrassment on Gilovich’s part. I think he’d be better off being a bit more gracious and saying to the next reporter something like, “Yeah, we didn’t catch that bias, and Miller and Sanjurjo made me realize that we were wrong to claim there was no hot hand. We’re glad that our paper continues to get attention 30 years later, and I hope people won’t forget the other points we made, which have withstood the test of time.”

“My reasoning goes as follows. Gilovich et al. reported three things in their paper. First, that there’s no evidence for any hot hand in basketball shooting. I think they were wrong on this one; it does seem that, if you look at basketball data carefully, you do see evidence for a hot hand, it’s just that the original analyses were hampered by bias and variance issues. Second, Gilovich et al. report that basketball fans view the hot hand effect as huge, much larger than any such effect in reality. I find their results convincing on that point. It does seem that once a person focuses on any particular effect, once he or she believes it to be nonzero, there’s a tendency to overrate its importance. I guess that’s related to the “availability heuristic” studied by Amos Tversky, another author of that hot hand paper. Gilovich et al.’s third finding is that people perceive a hot hand even if you give them completely random sequences. That appears to be true too, even if not newsworthy on its own.”

This is just so reasonable that I don’t understand how this is still a point of debate. Fundamentally, if you’ve ever actually played basketball (and I have), you realize that the act of shooting a basketball is not of the same type as flipping a coin, drawing a card, spinning a roullette wheel, pulling the lever of a slot machine. Its a completely different category of activity. In one set of activities the probability of success for each draw = the average probability of success across a large-N of draws. In the other set of activities the probability of success for each draw has high variance, such that it != the average probability of success across a large-N of draws in any particular draw.

Shooting a basketball a complicated physiological motion that you don’t do the same way each time. For a basketball shot, the probability of the shot going in on any one instance is not the probability of the shot going in on average. Why? Some days your legs are tired and you don’t jump, other days you jump and shoot better. Some days you are super nervous (perhaps its a tense, close game situation), other days you are calm. Some days you are rushing your motion, other days you aren’t rushing your motion. Some days there’s a man right on you, some days there isn’t. All of these things affect the probability of the shot going in, creating a high variance in probability across shots. None of this applies to rolling dice or flipping a coin. There the per draw probability of the event is stable, and always equal to the average probability across many draws.

Given that this is painfully obvious to anyone who has ever shot a basketball, what Andrew writes above just makes so much sense that it remains stunning to me that this is still a thing that people debate. Yes, the hot hand almost surely exists. But, yes, it could be pretty small in magnitude even if its real. And yes, people make perceptual errors and claim to see the hot hand even when it doesn’t exist. But that they make those perceptual errors, and that its not an enormous effect, doesn’t mean it isn’t a real thing.

I think

“hot hand almost surely exists”is meaningless unless combined with what magnitude of effect we are talking about.If it is a tiny, subliminal effect it becomes akin to asking if a tree falling makes a sound when no one is around to listen.

My hunch is that the magnitude in existing studies appears smaller than it is because there’s so much noise in the most binary shot count data that existing studies have been using to estimate this. Given data limitations, you’re in a situation where you are forced to compare fundamentally dissimilar shots (contested 4th corner 3 from top of the key vs. open, spot-up 1st quarter corner 3) as if they were draws with the same underlying probability, when they aren’t.

But now the NBA releases amazing data that uses stop-gap photography of every game to create a geo-coded dataset that accounts for the locations of all players, the game situation, etc, when each shot is taken. This allows researchers to actually account for key factors that change the underlying difficulty of the shots. If you use this data, there appears to be a large magnitude hot hand:

http://www.sloansportsconference.com/wp-content/uploads/2014/02/2014_SSAC_The-Hot-Hand-A-New-Approach.pdf “We then turn to the Hot Hand itself and show that players who are

outperforming will continue to do so, conditional on the difficulty of their present shot. Our

estimates of the Hot Hand effect range from 1.2 to 2.4 percentage points in increased likelihood

of making a shot.”

I don’t know why this isn’t the paper Andrew is highlighting in his posts. It actually accounts for the in-game variation in situational difficulty of shots in a way that the older papers he’s discussing cannot.

“… if a tree falling makes a sound when no one is around to listen.” I thought you were going to say “if a leaf falling makes a sound in a hurricane.”

Much better! :)

The last time this came up in Andrew’s blog, there was some discussion as to the difficulty of finding a ‘cold hand’, even in papers where a hot hand is found. I think your description is clearly the top-choice explanation of why there should be a hot hand, but it also predicts a cold hand. Why do you think the cold hand is so much tougher to find, and what does that imply about the hot hand?

That’s really interesting. I didn’t know that

Here’s my relevant comment on the prior post, which Josh Miller followed with some speculation. http://andrewgelman.com/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/#comment-225184

Cold hand vs hot hand is a selection effect (coaching decision).

If a player is stinking up the joint, he gets subbed out.

The audience, boxscore, sportsvu only observes non cold players

Tiger Woods has had a cold hand for two years now.

A better of way of putting this is that even though Kyle Korver is on average a 52% 3-pt shooter, his probability fluctuates widely depending on the situation. Its not a 52% weighted coin flip each time he shoots. I don’t have all the data in front of me, but he presumably shoots a lower percentage in the 4th quarter when he’s tired, a higher percentage from the corners (where the shot is closer) than the top of the key, a higher percentage on assisted spot-up shots than on shots off the dribble, a higher percentage when he’s open than when there’s a man in his face, etc. None of the analyses of the hot hand I’ve seen really seriously attempt to account for this kind of variability. But the instantaneous probability of Kyle Korver’s shot going in is almost never 52% even if the average probability of 1000 shot attempts is 52%. Given that, its entirely possible that he strings together 4-6 attempts in a row in some games where his probability is way higher than 52%, which would be a “hot hand” phenomenon.

+1.

Many years ago I used to play Basketball, badly. I was a 1-2 baskets in 60 minutes if I was lucky player. No rythym, no accuracy, always got a bit panicky shooting.

Except one day. That day, for some reason, for a while, I had the rythym and accuracy. I shot and got a basket, then another. I tried not to think about it, as I was convinced that it’d never work. I got another. By the fifth basket, my team didn’t even bother joining me in the opponents half, and I still scored. At that point I did think about it, pissed myself laughing, and didn’t score for the rest of the day.

I’m sure if you can control for all the physical and mental variation you’d disprove hot hand, but I am also sure they influence shooting ability.

It would seem like we should distinguish between free throw shooting, which is quite standardized, and field goal shooting.

I disagree. There is still a great deal of variance in the within-player instantaneous probability of making a free throw shot. Its still not very standardized. Are you tired — how many minutes have you been in the game? Are you nervous? Is the game close? Are you in an away arena with fans screaming at you and waving those plastic noise sticks in your face? Are the others players at the line talking trash about your mother while you’re trying to shoot? Each draw ISNT from a bernoulli with a constant p. The p for each individual draw is fluctuating based on all of these things above, plus others, and isn’t just the average p for the some large-N of foul shots. Its still the same fundamental problem. When we treat the foul shots as binary data assumed to come from the same underlying distribution within players, we’re leaving a lot of noise in our analysis that can drown out potential patterns.

“Each draw ISNT from a bernoulli with a constant p. The p for each individual draw is fluctuating based on all of these things above, plus others, and isn’t just the average p for the some large-N of foul shots”

How much greater is the individual game-to-game variance than predicted by the binomial model (take the overall average percentage as p)? This data surely exists, is it collected for easy access anywhere? In fact, if the variance is close to np(1-p) that would indicate some process is cancelling out those other sources of variation.

Steve:

Gilovich et al. consider three scenarios: field goal shooting, free throw shooting, and shots in practice, which would seem to be in decreasing order of variance within players over time.

If it takes more than 1000 shots to detect a hot hand if the difference between hot and not is 0.05 than does such a hot hand have any meaning?

Besides all the variety caused by different shots taken during a game i.e. some difficult, some easy, enforced by the opposition, it could also be that players who believe they have hot hands try more difficult shots because they think they have a greater chance of success while hot. For example a player may attempt 3 point shots when hot but never attempt when not. The overall rate of success is the same but more risky shots may be made.

“it could also be that players who believe they have hot hands try more difficult shots because they think they have a greater chance of success while hot. “

That is EXACTLY the finding in this paper: http://www.sloansportsconference.com/wp-content/uploads/2014/02/2014_SSAC_The-Hot-Hand-A-New-Approach.pdf

Once accounting for this and adjusting for variation in shot difficulty — which the papers Andrew cites above do not — they find evidence of a substantively large hot hand.

There’s a culture of this in the NBA. They call it the “heat check” shot. After making a series of shots in a row, a player will often take a wild high difficulty shot early in the shot clock that he wouldn’t have otherwise taken (because its such a low probability shot) to see if he’s “hot.” But its a lower probability shot. The classic example of this is JR Smith, of the Cleveland Cavs.

“I think he’d be better off being a bit more gracious and saying to the next reporter something like, ‘Yeah, we didn’t catch that bias, and Miller and Sanjurjo made me realize that we were wrong to claim there was no hot hand. We’re glad that our paper continues to get attention 30 years later, and I hope people won’t forget the other points we made, which have withstood the test of time.’”

Nicely put. I think this gets back to other discussions about how we view science and what a published paper really means. Some people seem to take printed results as absolute truth and feel as though they must defend them. Others are more forthcoming about the uncertainty associated with their findings (e.g., evidence based upon a single sample, small sample, noisy samples, coding mistakes, etc.). When viewed from the latter perspective, new or contradictory results are not threatening but simply updating our understanding of the world. Scholars are going to make mistakes, particularly when studying complex things.

I’m new to this discussion, so sorry if this is an old question, but… Has anybody checked for a “hot hand” in a sport like bowling, or archery or riflery? It seems like these sports could provide a much less noisy data set. No defense trying to thwart you, no shots from varying points on the court, no shot clock etc. Whatever the effect size in these other sports, the effect in basketball, after averaging over all the stuff you need to average over, is almost certainly smaller. Would you agree?

Perhaps not bowling, doesn’t everyone score pretty close to 400? There’s not much discernible difference between a hot hand game and a typical game.

For AG:

“It does seem that once a person focuses on any particular effect, once he or she believes it to be nonzero, there’s a tendency to overrate its importance…”

This is right, and the specific concept you’re looking for is the Focusing Illusion. Here’s Kahneman’s overview of it:

https://edge.org/response-detail/11984

Linsanity of the 2012 NBA season was an example of a hot hand.

George Johnson’s News Analysis in the Sunday October 18, 2015 Edition of The New York Times draws incorrect conclusions and is riddled with math errors:

As Mr. Johnson lays out in his table, there are 48 possibilities for the first three flips of four coins. Of these flips, 24 are heads and 24 are tails. Heads are followed by tails 12 times and by heads 12 times, or 50%, not the 40.5% that Mr. Johnson suggests. The mistake that Johnson made in his analysis is by averaging the percentages across each of the 14 trials without weighting them for the number of heads in each trial. Johnson weights each of the trials equally, regardless of the number of heads and thereby calculates an average of heads of only 40.5%; he then incorrectly concludes that tails follow heads 60% of the time. This conclusion defies both logic and the mathematics of probability as this would mean that heads would also follow tails 60% of the time and the sum would be 120% (rather than 100%), which is clearly impossible.

The Miller Sanjurjo “study” will be quickly dismissed and discredited.

Ross:

Pro tip: referring to a study as a “study” doesn’t enhance your credibility.

Ross-

Thanks very much for pointing this out! I just got around to reading the George Johnson article today and was baffled by why he thought it made sense to average across the 16 trials. It does not make sense and i do not think this was the basis for the previously published analyses. Has anyone pointed this out to the Ney York Time and George Johnson?

Thanks,

-Walt

Dear Walt

While it is true that GVT 1985 do not average across sequences in their analysis, if you want to compute the bias, this is what you need to do, i.e. average across all the possible sequences in the sample space.

To make more clear the selection bias in GVT 1985, here is a copy-paste of a probability puzzle that we have been sharing, which illustrates the bias:

Jack flips a coin 4 times, and then selects one of the heads flips at random and asks Jill: “please predict the next flip.”

Jill asks: what if you selected the final flip, how can I predict the next flip?

Jack responds: I didn’t select the final flip, I selected randomly among the heads flips that were not the final flip.

Jill asks: okay, but its possible there aren’t any heads flips for you to select, right?

Jack responds: I just told you I selected a heads flip!

Jill asks: fine, but what if there weren’t any heads flips for you to select? what would you have done in that case?

Jack responds: I wouldn’t be bothering you with this question.

Jill says: okay then, I predict tails on the next flip.

Jill has made the correct choice.

For most of us, our intuition says that it shouldn’t matter what Jill predicts, there is a 50% chance of a heads on the next flip.

This intuition is wrong. Given the information that she has, Jill knows there is a 40.5% chance of heads on the next flip.

In the long run, if Jill predicts tails every time that Jack asks her to predict, Jill will be right 59.5% of the time.

This is the selection bias: When analyzing a finite sequence of coin flips, if you select for analysis a flip that follows a heads, you are more likely to be selecting a tails.

But why is the intuition wrong – where does 40.5% come from?

here is a start. Each sequence has a 1/14 chance in this game (Jack doesn’t ask for TTTH, TTTT).

Suppose Jack chose flip 1, there are 8 possible sequences with H???.

What is the probability it came from sequence HTTT? Is it 1/8?

No, Pr(HTTT|Jack chose flip 1)= Pr(Jack chooses flip 1| HTTT)Pr(HTTT)/Pr(Jack chooses flip 1)=1*(1/14)/(1/3)=3/14>1/8.

It is easy to show Pr(HTTT|Jack chose flip 1)>Pr(HHHT|Jack chose flip 1).

And so it is each to show Pr(flip 2 is H | Jack chose flip 1)= Pr(HHTT|Jack chose flip 1)+Pr(HHHT|Jack chose flip 1)+Pr(HHTH|Jack chose flip 1)+ Pr(HHHH|Jack chose flip 1)<1/2.

Probably a simpler way to state this, without all the back and forth between Jack and Jill, is just say:

“Suppose Jack randomly selects a flip that is immediately preceded by a heads, what is the chance the selected flip is a heads?”

When you select flips for analysis based on outcomes of previous flips, you generate a bias.

The hot hand shouldn’t be the only effect considered. If a player has made several baskets a in a row, a few other things are likely to happen in basketball. 1) they’ll get more attention from the defense, and 2) they’ll be more likely to take difficult shots because they feel like they’re more likely to make them.

So the hot hand effect could be larger than observed in the data and partially offset by a shot selection effect where players select lower probability shots because the players themselves believe the hot hand makes them more likely to make those difficult shots.

I disagree that Miller & Sanjurjo have much to say about Gilovich, et al (1985). I’m looking at Gilovich’s Table 1, which shows the raw empirical p(hit|miss) and p(hit|hit) as being similar. The sample sizes for each player are each greater than 248. Based on my monte carlo evidence, the Miller & Sanjurjo bias is on the order of 0.002, not 0.02, when sample size is 248. Am I missing something?

Dear Lucas

Good question.

In the seminal 1985 paper of Gilovich, Vallone and Tversky (GVT) conducted a controlled shooting study (experiment) with the Cornell University men’s and women’s basketball teams as a “…method for eliminating the effects of shot selection and defensive pressure,” effects (confounds) which hampered their interpretation of the Philadelphia 76ers game data that they had discussed in an earlier section of the paper. On page 22 of our paper here: http://ssrn.com/abstract=2627354, we focus on Table 4, which pertains to the Cornell data (Study 4), and not Table 1, which pertains to the 76ers data (Study 2).

In short, Study 2 of the original paper has a severe endogeneity problem, which was pointed at quite early, e.g. on the first page of Avinash Dixit and Barry Nalebuff’s Thinking Strategically book they explain clearly the problem of strategic adjustment (see it here: http://bit.ly/1eXxdI3 ). Scientifically speaking, this is why Study 4 is so important, because it does not suffer from these issues. If you can show that there is no evidence of hot hand shooting in Study 4, it is reasonable to infer it doesn’t exist. This is also why great hay has been made about the no-effect result in the NBA 3 point study of Koehler & Conley (2003). When correcting for the bias, and looking at a lot more NBA 3 point data, we have also have come to the opposite conclusion see here: http://ssrn.com/abstract=2611987.

This is probably TMI, but if you have more curiosities with regard to whether we should be looking at game data to infer the *magnitude* of the hot hand effect, please see this link to a previous comment on a Gelman post: http://andrewgelman.com/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/#comment-227641

best

-j