Someone just stopped by and dropped off a copy of the book Wizardry: Baseball’s All-time Greatest Fielders Revealed, by Michael Humphreys. I don’t have much to say about the topic–I did see Brooks Robinson play, but I don’t remember any fancy plays. I must have seen Mark Belanger but I don’t really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn’t fielding either.
Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away.
(Note: Humphreys replies to some of these questions in a comment.)
1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I’ve always been curious about Bill James’s Pythagorean projection, so let me try it out here. If a team scores 700 runs in 162 games, then an extra 10 runs is 710, and Bill James’s prediction is Games.Won/Games.Lost = (710/700)^2 = 1.029. Winning 1 extra game gives you an 82-80 record, for a ratio of 82/80=1.025. So that basically lines up.
There must be some more fundamental derivation, though. I don’t see where the square comes from in James’s model, and I don’t see where the 10 comes from in Humphreys. I mean, I can see where it can arise empirically–and the idea that 10 runs = 1 extra win is a good thing to know, partly because it seems like a surprise at first (my intuition would’ve been that 10 extra runs will win you a few extra games), but I feel like there’s some more fundamental relationship from which the 10:1 or Pythagorean relationship can be derived.
2. As I understand it, Humphreys is proposing two methods to evaluate fielders:
- The full approach, given knowledge of where all the balls are hit when a player is in the field.
- The approximate approach using available available data.
What I’m wondering is: Are there some simpler statistics that capture much of the substance of Humphreys’s more elaborate analysis? For example, Bill James has his A*B/C formula for evaluating offensive effectiveness. But there’s also on-base percentage and slugging average, both of which give a pretty good sense of what’s going on and serve as a bridge between the basic statistics (1B, 2B, 3B, BB, etc) and the ultimate goal of runs scored. Similarly, I think Humphreys would make many a baseball fan happy if he could give a sense of the meaning of some basic fielding statistics–not just fielding average but also #assists, #double plays, etc. One of my continuing struggles as an applied statistician is to move smoothly between data, model, and underlying substance. In this case, I think Humphreys would be providing a richer picture if he connected some of these dots. (One might say, perversely, that Bill James had an advantage of learning in public, as it were: instead of presenting a fully-formed method, he tried out different ideas each year, thus giving us a thicker understanding of batting and pitching statistics, on top of our already-developed intuition about doubles, triples, wins and losses, etc.)
3. Humphreys makes the case that fielding is more important, as a contribution to winning, than we’ve thought. But perhaps his case could be made even stronger. Are there other aspects of strong (or weak) fielding not captured in the data? For example, suppose you have a team such as the ’80s Cardinals with a fast infield, a fast outfield, and a pitching staff that throws a lot of low pitches leading to ground balls. I might be getting some of these details wrong, but bear with me. In this case, the fielders are getting more chances because the manager trusts them enough to get ground-ball pitchers. Conversely, a team with bad fielders perhaps will adjust their pitching accordingly, taking more chances with the BB and HR. Is this captured in Humphreys’s model? I don’t know. If not, this is not meant as a criticism, just a thought of a way forward. Also, I didn’t read every word of the book so maybe he actually covers this selection issue at some point.
4. No big deal, but . . . I’d like to see some scatterplots. Perhaps start with something simple like some graphs of (estimated) offensive ability vs. (estimated) defensive ability, for all players and for various subsets. Then some time series of fielding statistics, both the raw data of putouts, chances, assists, etc. (see point 2 above) and then the derived statistics. It would be great to see individual career trajectories and also league averages by position.
5. Speaking of time series . . . Humphreys talks a lot about different eras of baseball and argues persuasively that players are much better now than in the old days. This motivates some adjustment for the years in which a player was active, just as with statistics for offense and pitching.
The one thing I’m worried about in the comparison of players from different eras is that I assume that fielding as a whole has been more important in some periods (e.g., the dead-ball era) than in others. If you’re fielding in an era where fielding matters more, you can actually save more runs and win more games through fielding. I don’t see how Humphreys’s method of adjustment can get around that. Basically, in comparing fielders in different eras, you have a choice between evaluating what they did or what they could do. This is a separate issue from expansion of the talent pool and general improvement in skills.
I enjoyed the book. I assume that is clear to most of you already, as I wouldn’t usually bother with a close engagement if I didn’t think there was something there worth engaging with. Now I’ll send it off to Kenny Shirley who might have something more helpful to say about it.