Skip to content
 

Basketball news: No, I don’t think that it’s better to be down by one point than up by one point at halftime. Or, to put it in statistical terms, 1.3% (with a standard error of 2%) is not the same as 7.7%

John Shonder pointed me to this discussion by Justin Wolfers of this article by Jonah Berger and Devin Pope, who write:

In general, the further individuals, groups, and teams are ahead of their opponents in competition, the more likely they are to win. However, we show that through increasing motivation, being slightly behind can actually increase success. Analysis of over 6,000 collegiate basketball games illustrates that being slightly behind increases a team’s chance of winning. Teams behind by a point at halftime, for example, actually win more often than teams ahead by one. This increase is between 5.5 and 7.7 percentage points . . .

This is an interesting thing to look at, but I think they’re wrong. To explain, I’ll start with their data, which are 6572 NCAA basketball games where the score differential at halftime is within 10 points. Of the subset of these games with one-point gaps at halftime, the team that’s behind won 51.3% of the time. To get a standard error on this, I need to know the number of such games; let me approximate this by 6572/10=657. The s.e. is then .5/sqrt(657)=0.02. So the simple empirical estimate with +/- 1 standard error bounds is [.513 +/- .02], or [.49, .53]. Hardly conclusive evidence!

Given this tiny difference of less than 1 standard error, how could they claim that “being slightly behind increases a team’s chance of winning . . . by between 5.5 and 7.7 percentage points”?? The point estimate looks too large (6.6 percentage points rather than 1.3) and the standard error looks too small.

What went wrong? A clue is provided by this picture:

Halfscore.jpg

As some of Wolfers’s commenters pointed out, this graph is slightly misleading because all the data points on the right side are reflected on the left. The real problem, though, is that what Berger and Pope did is to fit a curve to the points on the right half of the graph, extend this curve to 0, and then count that as the effect of being slightly behind.

This is wrong for a couple of reasons.

First, scores are discrete, so even if their curve were correct, it would be misleading to say that being behind increases your chance of winning by 6.6 points. Being behind takes you from a differential of 0 (50% chance of winning, the way they set up the data) to 51% (+/- 2%). Even taking the numbers at face value, you’re talking 1%, not their claimed 5% or more.

Second, their analysis is extremely sensitive to their model. Looking at the picture above–again, focusing on the right half of the graph–I would think it would make more sense to draw the regression line a bit above the point at 1. That would be natural but it doesn’t happen here because (a) their model doesn’t even try to be consistent with the point at 0, and (b) they do some ridiculous overfitting with a 5th-degree polynomial. Don’t even get me started on this sort of thing.

What would I do?

I’d probably start with a plot similar to their graph above, but coding score differential consistently as “home team score minus visiting team score.” Then each data point would represent different games, they could fit a line and see what they get. And I’d fit linear functions (on the logit scale), not 5th-degree polynomials. And I’d get more data! The big issue, though, is that we’re talking about maybe a 1% effect, not a 7% effect, which makes the whole thing a bit less exciting.

P.S. It’s cool that Berger and Pope tried to do this analysis. I also appreciate that they attempted to combine sports data with a psychological experiment, in the spirit of the (justly) celebrated hot-hand paper. I like that they cited Hal Stern. And, even discounting their exaggerated inferences, it’s perhaps interesting that teams up by 1% at halftime don’t do better. This is just what happens when studies get publicized before peer review. Or, to put it another way, the peer review is happening right now! I’ve put enough first-draft mistakes on my own blogs that I can’t hold it against others when they do the same.

P.P.S. Update here.

20 Comments

  1. Andy M says:

    Thank you for going over this and looking at the actual data. I had skimmed the Freakonomics article over and accepted the conclusions, so I much appreciate your fact checking and insistence upon the numbers!

  2. LemmusLemmus says:

    Have you considered writing the authors an email? This post reads like something they'd like to know.

    Your last but two sentences suggests that the peer review process magically erases all weaknesses a paper has. Certainly not! Much better to publicize a paper, have an unstructured open peer review process and get feedback like this.

  3. Richard D. Morey says:

    It seems to me that whatever model you choose, it HAS to go through the point (0,.5). That's the one point you know with absolute precision. I don't see any reason to treat that point any differently than the others. Would they say that being behind by one has a qualitatively different effect than being tied at the half?

  4. Andrew Gelman says:

    Andy: I didn't actually look at their raw data but I did carefully read the original article. Luckily for me, it was well written.

    Lemmus: I emailed Justin, who I'm sure will pass it on to the authors. I expect I'll get more of a response than when I emailed S. Kanazawa once. And, yes, the real problem here was not releasing a draft article; the problem was presenting its claims as fact in the New York Times.

    Richard: The authors do give a reason for why things would be different at 0. Their story is plausible and is consistent with (but not exactly implied by) the data.

  5. ZBicyclist says:

    Interesting question: if a student in my basic stats class, armed with 1 week of lectures on multiple regression, turned in something like Berger and Pope's analysis, would they get a B?

    Even looking at the graph above with the "curve" drawn in, I don't see a "curve" on each side — looks like a straight line to me.
    The conclusion in the NYT article will kick around as common wisdom for years, unfortunately.

    It's hardly surprising that a 1 point lead, 1 point deficit, and being tied are indistinguishable. But if you reported that, it wouldn't be a surprising enough finding to make the NYT.

  6. jbn9 [stat.duke.edu] says:

    I feel like you would also want to adjust for team ability using, for example, Sagarin ratings. Perhaps you would find there is no difference among close games with teams of drastically different abilities, but in games with teams of similar ability perhaps the effect exists and is greater than 1%.

  7. ceolaf says:

    A victory of trusted methods over actual context!!! What could be more true to ecomonics and policitical science than that!

    A linear regression? Really!? Is there any reason at all to assume that the underlying phenomemon actually linearly (even on a logit scale)? Any at all?

    Of course there isn't! But linear is easier to do and easier to conceptualize. Linear is the standard. Linear is what most research uses, for conferences, for journal articles, for talks. Linear, linear, linear. Oh, joy!

    ***************************

    1) Researchers should pick the a class of models that best describe the phenemonon, and then fit the model to the data. Assuming that variations from linear are error makes for simpler papers and lousy results. I mean, they are lousy if understanding the phemenon is more important to you than making your analysis easier.

    Berger and Pope are looking to see if it is NOT linear. So, they are coming up with the best model they can, one that best fits their data without preconceived notions of linearity. They are looking to see if a team that is just barely behind does as poorly as extrapolating from all the other data would suggest. That's their question, so what kind of approach would answer THAT question. It looks to me like you are offering what you would do with this data, not what you would do with that precise question. How do you look for a discontinuity?

    *****************

    I agree with you that each game should be counted once, and that figure 1a can be drawn from that data, but I do not understand why you think that it has not. I don't know whether it has or not, based on the paper.

    I am not clear why they picked a 5th degree polynomial, and so it is difficult for me to criticize their decision. But I certainly can criticize the paper for that. Shame, paper. Shame on you.

  8. Doug says:

    So teams up by 1 pt at halftime do very slightly worse (in this dataset) than teams which are tied. By the same logic, it's better to be up by 3 pts than by 4 at halftime and by 6 pts rather than 7 pts.

    Too bad, it's all random variation that won't hold up in the next 7000 games.

  9. john says:

    Things can't be different at zero, since one team wins and one team loses, but both were playing a game with a score differential of zero at the half. Therefore the probability of winning given a score differential of zero must be 1/2. You don't even need to collect any data on that.

  10. Phil says:

    For the dataset presented, it was a tiny better to be behind by 1 than ahead by 1 at halftime. OK, sure, you might be come up with a plausible story about why it's better to be down by one at halftime than to be ahead by one, but can you really make up a plausible argument for why it's better to be ahead by six than to be ahead by seven, as the data also show? (For what it's worth, which isn't much, it was also better, in those games, to be ahead by three than to be ahead by four). This sure looks like stochastic variability.

    How about a bootstrap approach to estimating the uncertainty in win probability as a function of halftime point difference, wouldn't that be an easy way to get error bounds?

  11. Brian Tung says:

    Linear is not a bad bet here, because it only has the one degree of freedom here, assuming you want the curve to go through the point (0, 1/2). A fifth-degree polynomial–presuming that's what the fitted curve is–has five degrees of freedom and is therefore enormously easier to fit to any data, whether it's warranted or not.

    However, I wouldn't try to fit linear here, either. For one thing, it violates reality: It suggests that the probability of winning when the halftime lead or deficit is large is outside the interval [0, 1]. For another, it doesn't stem from any underlying dynamic model. I would first go for something like erf(kx), where x is the point differential at halftime, and k is a constant of proportionality, and the only degree of freedom. It's consistent with reality, and it's close to what I would expect based on treating the lead/deficit as a random walk. It isn't exactly that because the random walk may be biased (one team is better than the other), and the distribution of team talent isn't generally known a priori. But it's probably a good guess.

  12. Brian Tung says:

    Oops, I mean [1+erf(kx)]/2. But you get the idea.

  13. Frank says:

    Under what context would you think that being behind at half-time would lead you to win? This seems like trusted methods confirming context.

    Did you see the graph? It certainly looks linear. If you believe in Ocham's Razor, then it seems like the burden is on the researcher to prove something is not linear. Again, based on the graph, assuming a linear fit certainly seems like a better approach than a 5th degree polynomial.

    Remember that 'linear' only applies to the coefficents, you can transform the regressors however you like, so it's not that restrictive. Finally, recall that many 'linear' models are simply linear transformations on other more complicated theoretical relationships (ie taking the natural log of a multiplicitive function).

  14. ZBicyclist says:

    @ceolaf: Take a shave with Occam's razor.

    Models should be no more complicated than they need to be. Linear is the simplest model.

    Andrew mentioned a 5th degree polynomial. Are there social science phenomena that fit a 5th degree polynomial? Even if so, are basketball scores likely to be among them?

  15. distantobserver says:

    Andrew –

    Your calculation is based on a slight misreading of the study. The authors state, on p. 5:

    'We focus on all games where the point differential at halftime was +/-10, which resulted in 6,572 games.'

    So the correct s.e. is .006 and the bounds are calculated as

    > round(.513+c(+.5/sqrt(6572),-.5/sqrt(6572)),3)
    [1] 0.519 0.507

    which is somewhat more conclusive evidence.

    I agree that the study is still in the review stage. To properly review it, however, the authors should provide replication data. since they don't we can only speculate about the details of how they treated their data to make ends meet.

  16. ceolaf says:

    ZBicyclists: Occam's razor's is about theory, not models.

    I know this is a difficult distinction for a lot of people. But it is a very important one.

    The point is not to choose the simplest model, or even to choose the simplest model that fits the data. The point is to come up with the simplest explanation of the actual phenomenon.

    Then, we come up with a model fo theory/explaination, and test it against data that measures that phenomena.

    Your misapplication of Occam's razor would have linear — not even logit — models for everything, regardless of the quality of their fit. It would preclude transforming the data in any way (e.g. taking the log of income, for example). You make no room for fit! Heck, you would have linear models with a single IV. It's simpler, right?

    There are two approaches this using models, and this is not just a quantative issue. We see this is linguistics as well, for example. One approach is to focus coming up with a workable model, and the other is focus on an accurate model of the underlying phenomenon. WIth the former, your treat the data as the phenonemon. With the later, you treat the data as being just a represenation of the phenonenon. With the latter, the model informs theory. With the former, the model IS theory.

    Unfortunately, limiting yourself to linear functions — even on a logit scale — does neither of those things.

  17. Brian Tung says:

    @Frank: You're absolutely right–I missed Andrew's qualifier about the logit scale. I'm not sure logit would be my first guess here, but it's reasonable.

  18. distantobserver says:

    Sorry Andrew, I was wrong about the s.e. The misreading was mine…

  19. LemmusLemmus says:

    celoaf,

    "the actual phenomenon" is represented by the actual data, not a fitted curve.

  20. Arthur Huang says:

    Well, a 5th order polynomial will collapse to a linear fit if that's what the data is, so that is besides the point. The problem here is that the authors ignored the point where the teams are tied… If it had been included the 5th order polynomial would have looked a lot like a line.