## Baseball’s greatest fielders

Someone just stopped by and dropped off a copy of the book Wizardry: Baseball’s All-time Greatest Fielders Revealed, by Michael Humphreys. I don’t have much to say about the topic–I did see Brooks Robinson play, but I don’t remember any fancy plays. I must have seen Mark Belanger but I don’t really recall. Ozzie Smith was cool but I saw only him on TV. The most impressive thing I ever saw live was Rickey Henderson stealing a base. The best thing about that was that everyone was expecting him to steal the base, and he still was able to do it. But that wasn’t fielding either.

Anyway, Humphreys was nice enough to give me a copy of his book, and since I can’t say much (I didn’t have it in me to study the formulas in detail, nor do I know enough to be able to evaluate them), I might as well say what I can say right away.

(Note: Humphreys replies to some of these questions in a comment.)

1. Near the beginning, Humphreys says that 10 runs are worth about 1 win. I’ve always been curious about Bill James’s Pythagorean projection, so let me try it out here. If a team scores 700 runs in 162 games, then an extra 10 runs is 710, and Bill James’s prediction is Games.Won/Games.Lost = (710/700)^2 = 1.029. Winning 1 extra game gives you an 82-80 record, for a ratio of 82/80=1.025. So that basically lines up.

There must be some more fundamental derivation, though. I don’t see where the square comes from in James’s model, and I don’t see where the 10 comes from in Humphreys. I mean, I can see where it can arise empirically–and the idea that 10 runs = 1 extra win is a good thing to know, partly because it seems like a surprise at first (my intuition would’ve been that 10 extra runs will win you a few extra games), but I feel like there’s some more fundamental relationship from which the 10:1 or Pythagorean relationship can be derived.

2. As I understand it, Humphreys is proposing two methods to evaluate fielders:
- The full approach, given knowledge of where all the balls are hit when a player is in the field.
- The approximate approach using available available data.

What I’m wondering is: Are there some simpler statistics that capture much of the substance of Humphreys’s more elaborate analysis? For example, Bill James has his A*B/C formula for evaluating offensive effectiveness. But there’s also on-base percentage and slugging average, both of which give a pretty good sense of what’s going on and serve as a bridge between the basic statistics (1B, 2B, 3B, BB, etc) and the ultimate goal of runs scored. Similarly, I think Humphreys would make many a baseball fan happy if he could give a sense of the meaning of some basic fielding statistics–not just fielding average but also #assists, #double plays, etc. One of my continuing struggles as an applied statistician is to move smoothly between data, model, and underlying substance. In this case, I think Humphreys would be providing a richer picture if he connected some of these dots. (One might say, perversely, that Bill James had an advantage of learning in public, as it were: instead of presenting a fully-formed method, he tried out different ideas each year, thus giving us a thicker understanding of batting and pitching statistics, on top of our already-developed intuition about doubles, triples, wins and losses, etc.)

3. Humphreys makes the case that fielding is more important, as a contribution to winning, than we’ve thought. But perhaps his case could be made even stronger. Are there other aspects of strong (or weak) fielding not captured in the data? For example, suppose you have a team such as the ’80s Cardinals with a fast infield, a fast outfield, and a pitching staff that throws a lot of low pitches leading to ground balls. I might be getting some of these details wrong, but bear with me. In this case, the fielders are getting more chances because the manager trusts them enough to get ground-ball pitchers. Conversely, a team with bad fielders perhaps will adjust their pitching accordingly, taking more chances with the BB and HR. Is this captured in Humphreys’s model? I don’t know. If not, this is not meant as a criticism, just a thought of a way forward. Also, I didn’t read every word of the book so maybe he actually covers this selection issue at some point.

4. No big deal, but . . . I’d like to see some scatterplots. Perhaps start with something simple like some graphs of (estimated) offensive ability vs. (estimated) defensive ability, for all players and for various subsets. Then some time series of fielding statistics, both the raw data of putouts, chances, assists, etc. (see point 2 above) and then the derived statistics. It would be great to see individual career trajectories and also league averages by position.

5. Speaking of time series . . . Humphreys talks a lot about different eras of baseball and argues persuasively that players are much better now than in the old days. This motivates some adjustment for the years in which a player was active, just as with statistics for offense and pitching.

The one thing I’m worried about in the comparison of players from different eras is that I assume that fielding as a whole has been more important in some periods (e.g., the dead-ball era) than in others. If you’re fielding in an era where fielding matters more, you can actually save more runs and win more games through fielding. I don’t see how Humphreys’s method of adjustment can get around that. Basically, in comparing fielders in different eras, you have a choice between evaluating what they did or what they could do. This is a separate issue from expansion of the talent pool and general improvement in skills.

Summary

I enjoyed the book. I assume that is clear to most of you already, as I wouldn’t usually bother with a close engagement if I didn’t think there was something there worth engaging with. Now I’ll send it off to Kenny Shirley who might have something more helpful to say about it.

1. Daniel says:

I agree with your points. There is a big difference in considering fielding during the "Juiced Ball Era" (when the likes of Sosa, McGwire and Bonds were hitting 60+ home runs each year and, at the same time, were also getting walked at a higher rate) versus only 10-15 years earlier when pitching was more dominant and strikeouts were more common. Also, (and this might be getting a bit too involved) perhaps players who played on early AstroTurf (which gave ridiculous bounces), but having data on where each player's games were played might not be available.

2. Bill Nichols says:

This is as good an explanation as any for the Pythagorean expectation

The exponent will depend on the distribution and the standard deviation of runs scored/allowed. Obviously as SD=>0 (delta function), a tiny difference yields a higher fraction of wins.

3. Millsy says:

Regarding Pythagorean Expectation, there is some work that looks a little more into this by a physics professor, Kerry Whisnant. The manuscript is here:

4. Carl Weisman says:

Bill James's book "Win Shares" provides a systematic approach to measuring fielding quality. He estimates a team's "runs saved," then allocates them between fielding and pitching, and finally allocates runs saved by fielding to positions.

He admitted, however, that he was still probably underestimating fielding value. For example, Marty Marion was NL MVP in a year when his method says Enos Slaughter was more valuable to their team. And Yogi Berra was AL MVP 3 times, when Mantle was the more valuable Yankee as far as James was concerned. Ditto Roy Campanella and Duke Snider on the Dodgers.

Of course, the voting baseball writers could be wrong, but surely they couldn't be as far wrong as James's algorithm says they were (said James!).

5. Andy says:

Here is a nice little write-up on 'deriving' the Pythagorean Formula: http://www.eg.bucknell.edu/~bvollmay/baseball/pyt… The bottom line is that a linear form works just as well as a quadratic form.

Here's a paper that actually does derive the formula under the assumption of iid runs following a Weibull distribution: http://arxiv.org/abs/math/0509698v4. This is not too surprising given the similarity between the Pythagorean formula and logits. The Weibull assumption fits pretty well and the run independence, although clearly wrong, is not that terrible.

6. Andrew Gelman says:

Millsy:

I read Whisnant's paper and there was one thing that puzzles me. Whisnant finds two things:

1. Considering two teams that are the same in runs scored and the same in runs allowed, the team with the lower variance in runs scored will win more games. This makes sense to me. That extra run will help more if the game is close.

2. Considering two teams that are the same in runs scored and the same in runs allowed, the team with the higher slugging average wins more games and the team with higher on-base percentage wins fewer games. I would've expected the opposite: I'd think that a slugging team would show more variance in runs scored, which would cause the team to underperform in wins; see item 1 above.

Can you explain?

7. Michael Humphreys says:

Andrew,

Thank you for mentioning my book. Let me try to address the first two questions. I'll address the others either late tonight or tomorrow.

1. I don't know how people have arrived at the ten-runs-per-win idea, but it's been around a long time and helps the reader at the beginning of the book to get a sense of materiality. Happy to defer to the experts.

2. Fair points. I will have my first blog post up at the OUP website for the book. I hope it addresses the need for a clearer connection between the basic fielding stats, the "approximate" approach introduced by DRA, and the "full approach" sold by certain data companies to teams and (with licensing restrictions) to enthusiasts. The essence of the explanation would be as follows (focusing on first-order factors).

The primary job of a fielder it to prevent hits.

For outfielders, that means catching fly balls and line drives and recording putouts. For infielders, that means fielding ground balls, throwing out the batter, and recording assists. All else being equal, an outfielder (infielder) making more putouts (assists) is preventing hits.

Two simplifying assumptions proposed by DRA back in 2003: (i) ignore errors, because they are _already_ counted as one less putout or assist made, and have virtually the same effect as a 'clean' hit allowed, and (ii) ignore infielder putouts, because they are almost always pop ups that are 'automatic outs' analagous to strikeouts more properly credited to pitchers or tags or forceouts properly credited to the assisting player.

So what you want to do is find out (1) how many assists (putouts) each infielder (outfielder) made above or below expectation ("net plays"), and (2) how many runs, on average, are associated with each net play? If you multiply the two, you get runs saved (allowed), or "defensive runs." Divide that by ten and you have defensive wins.

The "approximate method" answers the first question per position with a little arithmetic and a forced zero intercept regression analysis upon regressors carefully chosen (a) not to be influenced by the fielders' skill at that position, and (b) to the greatest extent possible, not influenced by the team's fielding skill at the other positions. DRA can achieve the first goal better than all prior systems and the latter goal as well as possible given whatever data is freely available.

The second question is answered with a second forced zero intercept regression analysis of actual team net runs allowed onto all the residuals from the first series of regressions as well as all the key pitching variables–net SO, BB, HR, etc.

That's the essence of the approximate approach.
I can give more details if you like.

The "full" approach has so far involved having people attending each game or watching the game on videotape record their estimates of the trajectory ("t") (G for grounder, L for line drive, F for flyout, P for popup), slice of the field ("s"), and depth ("d") of every batted ball. For example, F8D would be fly ball, straightaway center, deep; P78S would be a popup in left-center, shallow. The entire field is covered with a grid with (s,d) locations.

All you have to do is count how many balls were in each (t,s,d) 'bucket' when the player was on the field, and multiply the league-average out conversion rate for each bucket, to derive expected total plays by the fielder. His actual plays, minus those expected plays, multiplied by about .75-to-.85 runs per play yields his defensive runs.

Wizardry tests results under the "approximate" method against the "full" method. Wizardry also highlights the data accuracy problems under the "full" method and the miscalculations analysts have made using the "full" method data.

8. Bill Nichols says:

Andrew,

You wrote 1 backwards, the team with higher variance loses more games. Given two teams with equal expectation, every game that a team wins by multiple runs is X number of games that must be lost by 1 run. The extreme case is score all your runs in one game, lose the rest. That still leaves the puzzle for 2.

The answer is that one cannot hit a 5+ run homer. But that will win a lot of games. A high scoring team must have a lot of runners on base and a high on base to avoid outs. Really big innings require an extended sequence. Hence, for equal mean expectation, the low on base high power team wins more lower scoring games than a high on base low power team.

9. Andrew Gelman says:

Bill:

I went in and fixed the typo in 1; thanks for pointing this out.

But I'm still puzzled by 2. I don't really understand your explanation at all. I'm sure you're right–at least, Humphreys has the data to back it up–I just don't see the logic.

10. Millsy says:

I was confused about #1, Bill is right that the conclusion was teams with less variability win more games.

I was looking for the paper relating pythag to the weibull, so I'm glad someone linked that one.

Reading the study a little closer, and thinking about #2, I agree with Bill that it's because standard OBP requires an extended sequence. Ultimately, you can use the negative binomial to think about it:

Suppose there are two teams Team 1 and Team 2. Team 1 only hits singles and takes walks. Team 2 only hits HR, with no walks. Also assume that base runners advance only a single base on a single or walk. Here, let's assume there's some number of walks that result in even run averages, but but that the probability of a hit for the two teams is equal:

Prob(t1=1B) = Prob(t2=HR)

This way, they both have the same batting average, but the OBP of Team 1 is higher.

Team 1 needs combination of 4 hits and walks in 6 trials (i.e. before 3 failures and the inning ends).

On the other hand, the slugging team only needs 1 success in 4 attempts to score, since a home run automatically scores.

Ultimately, Team 1 requires a longer sequence of successes than Team 2. How this plays out on the field, I guess, is described in Whisnant's article.

However, going back to #1, I've seen some sabermetric stuff claiming that higher variability for pitcher performance as resulting in more wins for the team. But, as Whisnant describes, the reason that the larger variability wins less games is that the distribution tends to be skewed so that a team scores less runs than average more often than more runs than average. If you play a team that always scores their average (the same as yours), then I imagine you'd be at a disadvantage.

It sounds like this run-per-game distribution indicates runs come in bunches when they come. It's not so clear to me why this happens, but may be a result of the opposing team throwing crappy pitchers since they expect to lose anyway once they're already behind by X runs.

11. Bill Nichols says:

My intuition is that a big inning for a high on base team is bat around and score a lot of runs while a big inning for a power team is get someone on base and hit a 2 run homer.

I wrote a simple R program for a brute force example.

Take it to the extreme
team A, on-base just over .500. They either walk or out. They will average a run an inning with a standard deviation of about 2.

Team B, HR once ever 4 plate appearances, strikeout the rest. They average about a run per inning with a SD of roughly 1.

The power team is more consistent because they have a lower ceiling.

12. Peter says:

"You wrote 1 backwards, the team with higher variance loses more games. Given two teams with equal expectation, every game that a team wins by multiple runs is X number of games that must be lost by 1 run. The extreme case is score all your runs in one game, lose the rest. That still leaves the puzzle for 2."

Although this conjecture may be true in practice, it is not generally true. Consider a league with 2 teams, team A and team B. Team B always scores 2 runs, without any variance. Team A always scores 1 run. Team A loses every game.

Substitute in Team A*, which scores 3 runs in every 3rd game and scores zero runs in the other 2. Team A* has a 1 in 3 winning percentage. Note that teams A and A* have the same expectation over runs, and Team A* has a higher variance. Even if Team A* scored all of its runs in one game, it would still do better than Team A.

This is a silly counter-example but the simple intuition likely holds: teams with low expected run totals in relation to their opponents may sometimes benefit from variance in run scoring. In low-scoring sports like baseball (or soccer or hockey), this is probably a pedantic point. But I'd think that, somewhere in the history of baseball, there's been a sufficiently bad team (perhaps the 1962 Mets?) that could have benefitted from a little more variance.

13. Bill Nichols says:

The behavior of variance reducing win probability depends upon two conditions
1) a skewed distribution (mean greater than the median)
2) two teams have similar expectation values.

Because there is a lower bound, but no ceiling, greater variance tends to increase the skew.

Your example of the 1962 Mets gets me to thinking,

All things considered, the 1962 Mets probably couldn't win ANY games without variance. They were clearly inferior to the other major league teams. Where one team is inferior, variance is required for them to win. Variance is what makes the games interesting.

A counter point would be that the Mets best chance was variance with the other team. For example, a Mets pitcher throws a shutout. (Spawn and Sain, then pray for rain)

The data may not have shown it, but variance is friend of the weak and enemy of the strong. Maybe this applies mostly to the other team's variance.

14. Michael Humphreys says:

Andrew,

Picking up on your points 3, 4, and 5 . . .

The way DRA works, if a team has ground ball pitchers and a great infield (the outfield is not relevant), the infielders will be rated higher.

That's because DRA (and almost all other systems) rate fielders compared to what an average fielder would have done if he had played for that team, that is, played behind those pitchers. With more opportunities, the better fielder will have more chances to exceed the number of expected plays by the average fielder.

Kenny Shirley worked with some batted ball data used for what you call the "full" approach to create a Bayesian model to predict how many runs each fielder would save if he played for the average team. Note the 'switch'. This approach does a better job of measuring the inherent 'ability' of the fielder given an averge set of circumstances; the DRA approach measures the actual 'impact' of the fielder that year, given his circumstances.

Regarding point 4, it would be interesting to do some scatter plots, per position, of defensive runs and offensive runs per player. There would almost certainly be a slight negative correlation.

More plots are generally better than fewer, but I'm not so sure that plotting assists, putouts, and errors over time would reveal anything that hasn't been long known. But I hear you that, gooing back to point number 2, that it might have helped readers to _start_ with the very basic fielding stats and before showing how they are transformed into defensive runs under DRA.

Regarding point 5, the timeline adjustment for DRA, called "Talent Pool Adjusted Runs" or "TPAR" indirectly addresses the lower impact that fielder have now and discounts the higher impact they had in the Dead Ball Era. When fielding outcomes were more important, they were more _variable_. The best fielders of the Dead Ball Era could therefore rack up high defensive runs numbers. TPAR effectively discounts those higher totals.

15. Michael Humphreys says:

Daniel, please see my response to Andrew's fifth point.

16. Michael Humphreys says:

Carl, thanks for mentioning James' Win Shares. As I mention in my book, I got several ideas for DRA (the system I use in my book) frm Win Shares. That said, Win Shares does not have a good correlation with the "full" batted ball data systems Andrew mentioned. And Win Shares tends to understate the fielding value of the all-time great fielders by about half.

17. Andrew Gelman says:

Michael:

I don't see how you can claim that, just because a trait (for example, a fielding statistic) becomes more variable, that it's necessarily becoming more important. I wouldn't be surprised if pitching statistics, hitting statistics, and fielding statistics are all becoming less variable over time, but I don't think that means that all three are less important!

18. Michael Humphreys says:

Andrew:

I should have been clearer. Fielding statistics have become less variable over time _relative to_ hitting and pitching statistics.