## Is a steal really worth 9 points?

Theodore Vasiloudis writes:

I’d like to bring your attention to this article by Benjamin Morris discussing the value of steals for the NBA. The author argues that a steal should be a highly sought after statistic as it equates to higher chances of victory and is very hard to replace when a player is injured.

I would argue that the reason behind the correlations showing this data is the fact that steals are much more rare in an NBA game than any of the other stats examined so their contribution is exaggerated.

I looked at Morris’s article and it looks like he’s running a regression of players’ plus-minus statistics on points, rebounds, assists, blocks, steals and turnovers. He writes, “A marginal steal is weighted nine times more heavily when predicting a player’s impact than a marginal point. For example, a player who averages 16 points and two steals per game is predicted (assuming all else is equal) to have a similar impact on his team’s success as one who averages 25 points but only one steal. If these players were on different teams and were both injured at the same time, we would expect their teams to have similar decreases in performance (on average).”

I’m not quite sure what is being done, though, because plus-minus statistics have been well known for awhile (see here, for example, from 2007) but Morris does not use the term “plus-minus” nor does he connect to the literature on it, instead writing, “I [Morris] used this technique quite a bit throughout my treatise on Dennis Rodman, though it is actually better suited to broader analysis such as this.” So this suggests to me that, either I’m missing something or Morris is. I expect I’m the one who’s out of the loop here, so maybe some of you readers can help out.

From a statistical point of view, we have the usual challenge of distinguishing correlation from causation. I don’t think Morris is literally saying that the marginal benefit of a steal is 9 points. To calibrate, one might try to figure out the marginal benefit of a 2-point shot, which I’d think would be on average a bit less than 2 points because of the possibility of an offensive rebound. Or, for that matter, the marginal benefit of a rebound. But, for Morris’s goal of player evaluation, the causal effect is not what’s important. Rather, his finding is that teams have a better point differential when players with more steals are in the game. This seems like a reasonable enough regression to me, but at the same time I agree with Vasiloudis that the conclusions look odd. There’s not enough of a trail of bread crumbs for me to have confidence in the result. Which is not to say that I think it’s wrong, I just don’t quite know what to think of it.

P.S. Lots of good stuff from commenters who know a lot more about basketball than I do.

1. Protagoras says:

I thought the original discussion made it relatively clear; it’s not that a team would do best fielding all players who steal frequently and score few points, but that usually when you add someone who steals more and scores less, the rest of the team will pick up the slack on scoring. What makes players low or high scoring is often how likely they are to go for shots themselves as opposed to passing off to teammates. Thus, adding a low scoring (or removing a high scoring) player won’t actually have a huge impact on total scoring. On the other hand, if you remove the player who steals, the other players will not pick up the slack on stealing. That seems to be the point of the discussion of replaceability in the original article.

2. P says:

Critical to the 538 article’s point is the idea that ‘9 points’ isn’t actually 9 points on the scoreboard, it is nine points scored by a particular player. And 9 points scored by a particular player is less than 9 points on the scoreboard because (according to the article, and it makes sense), many of those points would be scored anyway by someone else on the court.

The core idea is that a scorer who puts up 25 points a game is heavily taking advantage of opportunities the team creates and only adding some value of his own (worth much less than 9 points) while someone stealing the ball is (according to the article) creating an opportunity largely out of nothing and it is (according to the article) worth a roughly 2-3 point swing (1 point for the stolen opportunity for the other team and 1-2 points for the points scored off the steal, where (according to the article) steal-started possessions lead to points more often than other possessions).

Not sure the evidence backs all of this up, but the statement ‘a steal is worth nine points’ is actually a little subtler and more reasonable than the surface translation of ‘9 points vs a steal’ would suggest.

• Alex D says:

Yes, as you’ve stated here the claim is actually subtler than the statement “a steal is worth 9 points” seems at face value. This is made explicit in footnote 5:

“Because we’re particularly interested in how each stat compares with points scored, I’ve set the predictive value of a single marginal point as our unit of measure (that is, the predictive value of one point equals one, and something five times more predictive than a point is five, etc.).”

The actual result is that the regression coefficient for a steal has 9 times the magnitude of the regression coefficient for a point scored in a regression predicting the final margin of victory/defeat. It’s disappointing that this wasn’t in the main body of the article.

Indeed, the most worrisome part of this article is not the actual analysis (although an actual assessment of, say, model fit and leverage would have been really nice) but the shoddy presentation of that analysis. In addition to burying the critical definition of the reported quantity of interest in a footnote, there’s no discussion of how this quantity would be used to actually generate a prediction (which would have helped mightily with interpreting the result), or how one should actually validate such a model by predicting held-out data so we wouldn’t have to take the author’s word for it. These notions of grounding your analysis by using prediction are supposedly key aspects of Silver’s data analysis philosophy, at least according to his book.

The presentation in the original 538 was so compelling because it focused so much on the process of generating and validating predictions for elections. It seems the new 538 is more concerned with headlines that incorporate a particular result as the “stat to remember” without as much thoughtful discussion of estimation or interpretation. The resulting “data journalism” isn’t strong on either the data analysis or journalistic fronts. Hopefully their editorial process will tighten up going forward.

3. Nate says:

2-point shot made: if the player does not score, someone else on the team may take the shot(s) that generated the score (still contributes to +/-).

Steal: generates a shot that would not otherwise be available to the team. Impact could even be understated, because the player himself may take that marginal shot after a steal (slight confound with own-points scored). Also consider that failed steal attempts could lead to shots for the other team…

4. jonathan says:

I started the article with hope (yesterday, when it came out) and grew sad as I realized it was mostly an elevation of a single case, the one year so far performance of Ricky Rubio and/or Kevin Love (over a 3 year (?) period where the team has lost more games than it won, with this year being closest to a winning record). We don’t, for example, see the effect of Rubio or some equivalent on a number of teams, a la Robert Horry or more famously perhaps Shane Battier.

The gist seemed to be the argument that a steal is less likely to happen so if you have a guy who does steals more often he’s harder to replace. I get that: fewer steals guys, harder to replace. I didn’t see a clear relationship that generated the points attributed to steals. That is this quote: “For steals, the picture is much different. If a player averages one more steal than another player (say 2.5 steals per game instead of 1.5) his team is likely to average .96 more steals than it would without him (if all else stayed equal). That’s why, as an individual player action, steals are much more irreplaceable than points.” OK, so if there is a SAR – steals above replacement – then Rubio would rank in that.

I can’t see how this makes the larger case, especially when the rest of the article is mostly a comparison of basic aggregates like number of wins with or without Kevin Love. And he doesn’t analyze Kevin Love’s game, which would tend to show why the team can win games without him – except of course the team is sub-500 so that data isn’t all that meaningful anyway.

5. Eric Loken says:

I’d be curious about the correlation matrix for the predictors and the pairwise correlations to the score differentials. Possible conditions here for a suppression effect if, for instance, two predictors correlate negatively with each other, but both positively with the outcome or some other odd combination.

• B Mills says:

I haven’t read the full article, but if OLS is used for estimation then this is likely the case to some extent. This point has been made with using regression to measure values of certain events in baseball (Linear Weights stuff). There’s not much excuse to not just simulate a baseball game given its discrete nature, but I imagine it becomes much more difficult to do so in basketball with so many dynamic things going on.

P above has a nice defense of the idea though, which makes it seem reasonable in the given context.

6. Jonathan (another one) says:

The other obvious way that a marginal steal could be very valuable is if the (futile) attempt to avoid the steal leads to the other team being unable to run its normal offense, leading to more turnovers and quicker, lexx comfortable shots. Beware partial equilibrium thinking in a regression context.

7. Phil says:

Even if taken at face value, the results in this article wouldn’t suggest that a steal is literally worth 9 points. If a player or team gets a lot of steals, he/they are also good at other aspects of defense. I think “steals” is a proxy for overall quality of the defense, and that it’s the latter that really matters.

• nobody says:

Thought the same and only wanted to go a little further with it. While there are other defensive stats (blocked shots, defensive rebounds), it’s reasonable to think steals reflect other untracked aspects of defense… attempted steals, discouraged passes, etc. I think the nature of defense more broadly implies this sort of indirect causation.

For example, cornerbacks in the NFL — while the NFL tracks INTs, blocked passes, disrupted passes, and so on, for the top cornerbacks, the offense will simply never or almost never choose to pass to the covered receiver, even if it is their best receiver. This can be much more difficult to track as some teams will throw to preferred receivers anyway and assignments might change during the game. Similarly, basketball players can choose to curtail their passes to or from their star players when covered by players with high stealing abilities. Seen this way, # of steals tracking an outsized impact on the game becomes a lot more comprehensible.

• Ian says:

It’s not that simple in basketball. “Going for a steal” in basketball is a gamble, because if you don’t get the steal, you’ve moved in so close that the player you’re guarding has new offensive opportunities (driving by you, passing to a player who can no longer be double-teamed, etc).

I believe the conventional wisdom is that players who aren’t strategic about when they choose to lunge in for steals actually end up weakening team defense.

8. Hi Prof. Gelman,

My regression isn’t to +/- or adjusted +/-, rather it’s to the game-by-game WOWY version of “+/-” that I developed for my Rodman series. It measures effect on team performance when players miss games for substantial periods. It’s generally too noisy for evaluating individual players (except in extreme cases), but I think it’s more accurate than play-by-play adjusted +/- when aggregated. It’s not the cleanest or most elegant dataset, but it works for testing a lot of basic hypotheses. Some of it is available on my old website, and as I redo the back-end research with ESPN data it should get more transparent.

However, contrary to much of the public reactions, the result about the predictive value of steals is not very controversial. Most other empirical metrics (like ASPM, which is based on play-by-play) have produced similar results. Really the only “new” ground covered by the article was the “Irreplaceability” metric, which seeks to explain *why* steals are so much more valuable predictively than their direct impact would suggest (without resorting to defense, intangibles, etc).

• Andrew says:

Benjamin:

Thanks for the background, that’s helpful. It’s funny because there’s a classic Bill James result that steals in baseball are less valuable than are conventionally thought (but James also wrote that even if the steals themselves are often close to useless or even counterproductive (when you account for “caught stealing”), that players who can steal bases have speed, which is useful in various aspects of offensive and defensive play. Of course there’s no reason why the effects of steals should be similar in these two different sports, but it’s the comparison that first came to mind.

• Kevin says:

Is there any way to adjudicate between your “irreplaceability” metric and the alternative hypothesis that losing high-steals players hurts team performance more because of the presumably irreplaceable defensive talents of those players? Is there any way to know/test how much your findings capture the fact that players who grab a lot of steals also tend to add tremendous value by pressuring opposing ball-handlers, shutting down passing lanes, and preventing the other teams from running their offenses effectively?

9. bxg says:

Off-topic (sorry), but about the 538 blog (which I can’t log into to comment directly). Mr Silver is now defending himself against his critics, notably Krugman, by looking at favorable vs unfavorable mentions and observing that since the it was relaunched under ESPN banner rather than NYT Krugman has been notably more negative.

“While it can be easy to extrapolate a spurious trend from a limited number of data points, the differences are highly statistically significant.”

Since Krugman is very open about not liking the change (I guess that’s therefore significant, and not needing any data), it seems
pretty scary to me that this blog becomes one of the most public faces of statistical thought. Make a huge change, your
critic doesn’t openly like it (the _huge_ change), but then think that a test of statistical significance is interesting? A defense? An aspersion? Good grief. This is our most public data nerd? :-(

• Andrew says:

Bxg:

Nate is clearly joking here. I think the best take on this is from Joseph Delaney, who expresses the hope that Nate’s new site is in the process of experimentation, and that Nate is trying a mix of sorts of items to see what works.

For all the criticism of the new site, I still think it’s a lot more data-based than what you’ll typically see in the popular press. For example, the post discussed above is data-based. Even if you dispute Morris’s interpretation of his analysis, he seems to be arguing based on data rather than from a rigid position (as we sometimes see on Freakonomics, for example).

10. Anonymous says:

I think this is a nice look that supports the obvious for two reasons

1) Scoring in the NBA is the easy part of professional basketball.
individually (1 on 1), it’s easier to score, than it is to guard someone.

2) as pointed out, successful steals are very likely going to turn into a scored posession rather quickly (think fast break)
Not only do you score, but you score quickly, force the team to inbound the ball, and you get to setup your defensive scheme again.

There’s a lot of utility in gambling for steals

allen iverson would be a cautionary tale on steal outliers and “victory”

• Martin says:

I agree, a steal might be the least visible output of a great defense work.

• Kellen Byrnes says:

Martin, I couldn’t agree more with your assessment that a steal in basketball, especially for a guard, is the manifestation of superior movement on the defensive end of the court. In the game of basketball, players will conduct their offense differently around a player who is excellent at stealing the basketball. What’s more, steals are typically the result of unique athletic qualities, superior positioning on the defensive end, and effort–these qualities would influence the offenses productivity on every possession, even if steals only occur on a small fraction. In a similar vein, the threat of a blocked shot can have as great an impact as a blocked shot itself, if the offensive player negatively adapts their approach to scoring the goal.

Basketball games are the synthesis of so many movements and dynamics, I believe steals are a highly useful indicator here indicative of contributions and abilities which are not easily replaced (i.e., scarce) in the NBA and highly advantageous towards winning games. Chris Paul may be a far superior example, though his wizardry on the offensive end tends to overshadow his defensive prowess, his current team in certainly in contention for a championship.