A basketball fan of my close acquaintance woke up Wednesday morning and, upon learning the outcome of the first games of the NBA season, announced that “The Warriors suck.”
Can we answer this question? To put it more precisely, how much information is supplied by that first-game-of-season blowout? Speaking Bayesianly, who much should we adjust our expectation that the Splashies will dominate this year?
This is an interesting question in its own right but also is an example of something I’ve been thinking about regarding base-rate fallacy and the rate of integration of information over time, and it relates to some of our favorite topics such as odds for presidential vote, Brexit, Leicester City, etc. My feeling is that the judgment-and-decision-making literature has a lot about base rate and weighting of prior and data, but not much on the time evolution of assessments: the process by which we move from base rate to data-based estimates. As has been discussed a few times on our blog, the failure of pundits or betting markets for Brexit, Leicester, and Trump-in-primaries were a too-slow incorporation of data as they were coming in.
So how to think about Warriors? I can think of two ways to go about this:
1. Fully model-based:
a. Fit a model to estimate team abilities from a season’s worth of data, basically the same model as the World Cup model that I fit in Stan a couple years ago, using as a team-level predictor some prior ranking of teams (from Sports Illustrated or whatever). Fit this model to last season’s data. Or, better still, fit it separately to each of the past several seasons (the “secret weapon“), for each season using that season’s prior rankings as a team-level predictor. Our model will be very similar to the World Cup model (again, we’ll predict score differential, not wins) but we’d also want to include home-court advantage.
b. From this fitted model, estimate the hyperparameters (or, we have estimates of the hyperparameters from each past season, and I guess take some sort of average). From these, we get a prior estimate and uncertainty of team ability, given its preseason ranking.
c. Then fit the model to the NBA data after the first round of games this year (which includes the Warriors-Spurs debacle from the other day). Now look at the new estimate for the Warriors and see how much has it changed?
2. Hacking something together:
The idea will be to estimate the information from this one game, to express this in the form of a likelihood function. It goes like this. The 30 teams in the NBA have some range of team qualities (“abilities,” in psychometric terminology). The prior rankings had the Warriors at #1 (according to Sports Illustrated) or #2 (ESPN) with the Spurs are ranked at #3. So, the question is, do the Warriors really “suck”? What’s the likelihood of the data?
Home teams win about 60% of basketball games, and in googling I see one source saying this is a 2.3-point advantage and another calling it 3.6 points. Let’s split the diff and call it 3 points. So if the Warriors and Spurs are approximately equal (that’s what you’d say for teams ranked 1 and 3, or 2 and 3, playing each other), you’d expect the Warriors to win by about 3 points when playing at home, hence a 29-point loss is 32 points worse than expected.
How bad is 32 points worse than expected? According to Hal Stern, the sd of NBA score differentials relative to point spread was 11.6; let’s call it 12 points. So 32 points is -2.67 sd’s compared to the expectation.
Now what about if the Warriors “sucked,” i.e. were an average team? How much would we expect them to lose to an elite team such as the Spurs? Hmm…. I looked up last season. The Warriors were the leaders at +10.8 and the Spurs were second at +10.5. Anyway, that suggests that an average team would get beaten by 10 points by a top team. Or get beaten by 7 at home. So, losing by 29 is a -22, which is -22/12 = -1.83 sd’s compared to the expectation.
Then, the likelihood ratio for this game is, in R, dnorm(1.83)/dnorm(2.67) = 6.6. If, for example, our prior belief was 95% that the Warriors are a top team and a 5% chance that they are mediocre (or, as Jakey would put it, that they “suck”), then our prior odds ratio was 19 and our posterior odds is 19/6.6 = 2.9, so now we think the chance they are mediocre is 1/(1+2.9) = .26 or 26%, and there’s a 74% chance they are elite and just got bad luck.
But maybe that’s too extreme. Let’s try a different one. Suppose the options are elite, strong, or mediocre, where “elite” = top team, “mediocre” = avg team, and “strong” is in the middle, i.e. should have no problem making the playoffs but will not be expected to make the Finals. Now, I don’t know what the best prior probabilities should be here. The way to check this would be to look at past teams ranked 1 or 2 in preseason in previous seasons, and see how well they did. So let me just guess on this one, let’s say, based on their preseason ranking there’s an 75% chance they’re a top team, a 20% chance they are strong, and a 5% chance they’re mediocre, in terms of how they’ll actually perform in the 2016-2017 season. Now the likelihoods: If Warriors are a top team, they performed 32 points worse than expected; if they’re mediocre they performed 22 worse than expected (see calculation above, based on the idea that a top team should beat a mediocre team by 10 points on avg), and if they’re strong, let’s say they did 27 worse than average. Then the likelihoods for top, strong, mediocre are given by dnorm(c(32,27,22)/12) = (0.0114, 0.0317, 0.0743), and when we multiply these by our assumed prior probabilities (0.75, 0.20, 0.05) and renormalize, we get (0.46, 0.34, 0.20), implying there’s roughly a 50% chance the Warriors are a top team, roughly a 1/3 chance they’re merely strong, and a 1/6 chance they’re mediocre.
The next step would be to make this into a continuous model where the underlying parameter is the team’s expected point differential during the season (could be anywhere from -10 which is last season’s Sixers to +10 which is last season’s Spurs or Warriors). Again, key step is assigning a prior based on their preseason ranking, and to do this right we’d want to look at the performance of top preseason-ranked teams in previous years.
Daniel Lee looked up the ESPN preseason power rankings for the past few seasons, and then we plotted what actually happened (for each team, its regular season per-game point differential, which I think should be a less noisy measure than simple won-loss record):
Just to clarify, by “preseason power ranking,” I mean rankings constructed before the season begins; I don’t mean that these are rankings based on preseason play (although I suppose that preseason play might contribute in some way to the rankings).
Assuming these numbers have been transcribed correctly, it suggests that preseason rankings can be way off: there’s a lot of variability here. On the other hand, maybe the Warriors’ #1 or #2 rating this year is stronger than, say, the Heat’s #1 rating before the 2014-2015 season.
One could argue the first game of the season is not so informative because the teams are still getting their acts together. One way to handle this would be to increase the sd of that predictive distribution. Suppose, for example, we ramp it up from 12 points to 15 points. Then the likelihoods for “top,” “strong,” and “mediocre” become dnorm(c(32,27,22)/15) = (0.0410, 0.0790, 0.1361), and when we multiply by our assumed prior probabilities and renormalize, we get (0.58, 0.30, 0.13), thus roughly a 60% chance the Warriors are a top team etc.
Also I looked up the point spread for that Spurs/Warriors game on Tues. It looks like the spread was 8.5, which seems just wrong if the two teams are close to evenly matched and home-court is only worth 3 points. So either I’m missing something important, or it’s just that bettors were seduced by the Kevin Durant mystique and overbet on the Warriors.
Finally, to loop back to the larger question: Look how hard it is to update beliefs based on information. Even in this highly controlled, nearly laboratory setting, there’s no way I could do this in my head. So you can see how pundits and data journalists can trip up and be too fast or too slow to incorporate new information on elections.
P.S. Feel free to argue with any and all of my assumptions and reasoning. That’s the whole point of this sort of analysis: it’s transparent, all the assumptions and data are out there so you can go back and forth between assumptions and conclusions. I just had to get this post out before the Warriors played their second game.