## A statistician rereads Bill James

Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about.

Here’s how it begins:

I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that October with about 20 other students, screaming at the TV, “Put Stapleton in!” Unfortunately, John McNamara didn’t hear us, and the rest was history.

I’m much less of a sports fan than I used to be, but the lessons I’ve learned from reading the Baseball Abstracts have done much to form me as a statistician. James doesn’t write much about statistical methods in any general sense–he comes up with what he needs to solve any particular problem–but from his practice one can extract some general principles:

– Methodological pluralism: Rather than try to come up with a single number or a single approach to summarizing player abilities, team strategies, or any other topic, he tried out a bunch of different ideas. In statistics, I like to say that each substantive hypothesis deserves its own analysis: it’s generally hopeless to expect that you can run a single regression and pull off the answers to each of your research questions, one coefficient at a time.

– Controlled comparisons: Instead of comparing simple aggregates, be more careful and make comparisons on pairs or groups of similar players or teams. As economists Rajeev Dehejia and Sadek Wahba demonstrated in a pair of influential articles (they have been cited over 2400 times since their publication a decade ago), these comparisons work only when you are controlling for appropriate characteristics. In the case of Bill James’s analysis, player age is typically a key comparison variable. From the standpoint of applied statistics, controlled comparisons combine the averaging that you get from having a moderate or large sample size with the insight that comes from understanding individual cases.

– Conceptual models used as guides to comparisons: James has written many times that he does not study statistical questions, he studies baseball questions. Each analysis is grounded in some goal. A conceptual model such as the defensive spectrum, or the narrowing of abilities, or the contribution of speed to both offense and defense, drives the direction of the study and motivates many of the details of the analysis. I have tried to follow these principles in my own work.

One central method of statistics that Bill James does not draw upon very often (if at all) is fitting parametric models. For example, James found that the power two in the Pythagorean prediction for wins worked pretty well. He didn’t try to estimate the power from data, nor did he, for example, try to come up with a conclusion such as, “each additional run is worth 0.093 wins.” On the rare occasions that he did estimate a parameter (for example, the relative values of stolen bases and times caught stealing), he buried his methodology and had no interest in making a big deal about the estimation.

Fitting models is something that statisticians are trained to do and in fact do all the time. Why didn’t Bill James follow the example of Pete Palmer and others and try to estimate the relative values of walks, singles, doubles, and other outcomes? . . .

Go to Baseball Prospectus for the rest.

1. Thanks for this, I've never read a statistics-centered semi-biographical essay on James's methods and interests. I too would love to read a Jamesian examination of pitch selection. And I managed to miss his de-emphasizing of batting order in the late 1990s.

There was a good profile of him in the New Yorker a while back, viewable here: http://www.newyorker.com/archive/2003/07/14/03071

"Standing athwart history yelling 'Put Stapleton in!'" is an apt metaphor for the "no fact zone" that is our political moment. It's no accident that Nate Silver began studying baseball before moving to political polling. Political discourse is an assortment of half-wits, nincompoops, and Neanderthals like George Will and Richard Cohen [who] are not only allowed to pontificate on whatever strikes them, but are actually solicited and employed to do this.

2. Frank says:

Having never read Bill James (except quotes you've mentioned on this blog) and not being a baseball fan, I think he probably just changed his mind about the importance of the lineup order, which perhaps properly belongs to the scope of “microdecisions'' and tactics, that require a different sort of analysis from strategic decisions. Besides choosing the order to maximize the probability of winning that day, I can imagine other management issues going into the decision, like implicit or explicit contracts with players about the opportunities they will be given.

3. Phil says:

I learned at least one thing: I didn't know that "In statistics we like to say that God is in every leaf of every tree."

Also, there are some words missing in the discussion of batting order. I think "and an unreasonable order" should be in there somewhere. Too bad all of the web pages have already been printed so there's no way to fix it.

Good article.

4. Steve Sailer says:

Thanks.

I have a book somewhere of old Tom Boswell Washington Post columns. He wrote several right before that notorious Game Six in October 1986 on the question of whether Bill Buckner was too injured to play anything other than Designated Hitter. Sadly, Buckner has gone through the years with the blame rather than his manager who didn't pull him for a defensive replacement.

5. Bill Nichols says:

"True ideas are those that we can assimilate, validate, corroborate and verify. False ideas
are those we cannot."

A surprisingly similar point of view from a that other William James.