I read Malcolm Gladwell’s article in the New Yorker about the book, “The Wages of Wins,” by David J. Berri, Martin B. Schmidt, and Stacey L. Brook. Here’s Gladwell:
Weighing the relative value of fouls, rebounds, shots taken, turnovers, and the like, they’ve created an algorithm that, they argue, comes closer than any previous statistical measure to capturing the true value of a basketball player. The algorithm yields what they call a Win Score, because it expresses a player’s worth as the number of wins that his contributions bring to his team. . . .
In one clever piece of research, they analyze the relationship between the statistics of rookies and the number of votes they receive in the All-Rookie Team balloting. If a rookie increases his scoring by ten per cent—regardless of how efficiently he scores those points—the number of votes he’ll get will increase by twenty-three per cent. If he increases his rebounds by ten per cent, the number of votes he’ll get will increase by six per cent. . . . Every other factor, like turnovers, steals, assists, blocked shots, and personal fouls—factors that can have a significant influence on the outcome of a game—seemed to bear no statistical relationship to judgments of merit at all. Basketball’s decision-makers, it seems, are simply irrational.
I have a few questions about this, which I’m hoping that Berri et al. can help out with. (A quick search found that this blog that they are maintaining.) I should also take a look at their book, but first some questions:
1. Reading Gladwell’s article, I assume that Berri et al. are doing regression analysis, i.e., estimating player abilities as a linear combination of individual statistics. I have the same question that Bill James asked in the context of baseball statistics: why restrict to linear functions? A function of the form A*B/C (that’s what James used in his runs created formula, or more fully, something like (A1 + A2 +…)*(B1 + B2 +…)/C) could make more sense.
2. Have Berri et al. looked at the plus-minus statistic, which is “the difference in how the team plays with the player on court versus performance with the player off court”? (See here for some references to this, also here and here.) When I started reading Gladwell’s article, I thought he was going to talk about the plus-minus statistic, actually.
3. I’m concerned about Gladwell’s causal interpretation of regression coefficients. I don’t know what was in the analysis of all-star voting, but if you run a regression including points scored and also rebounds, turnovers, etc., then the coefficient for “points scored” is implicitly comparing two players with different points scored but identical numbers of rebounds, assists, etc.–i.e., “holding all else constant.” But that is not the same as answering the what happens “if a rookie increases his scoring by ten per cent.” If a rookie increases his scoring by 10%, I’d guess he’d get more playing time (maybe I’m wrong on this, I’m just guessing here), thus more opportunities for rebounds, steals, etc.
Just to be clear here: I’m not knocking the descriptive regression. In particular, you can play with it to model what might happen if players are switched in an out of teams (as long as you think carefully about issues such as playing time, I suppose). I’m just sensitive to mistakenly-causal interpretations of regression coefficients–the idea that you can change one variable while holding all else constant.
4. Gladwell’s article is subtitled, “When it comes to athletic prowess, don’t believe your eyes,” and he writes, “We see Allen Iverson, over and over again, charge toward the basket, twisting and turning and writhing through a thicket of arms and legs of much taller and heavier men—and all we learn is to appreciate twisting and turning and writhing. We become dance critics, blind to Iverson’s dismal shooting percentage and his excessive turnovers, blind to the reality that the Philadelphia 76ers would be better off without him.” But it seems here that the problem is not that people are igoring the statistics, but that they’re using the wrong (or overly simplified) statistics. After all, he points out in the first paragraph of his article that Iverson has led the league in scoring and steals, and his team has done well. Even if he didn’t look cool flying to the basket, Iverson might have gotten recognition from these statistics, right? This is a point that Bill James made (with regard to batting average in Fenway Park, ERA in Dodger Stadium, etc.): people can overinterpret statistics in isolation.