Ben Lindbergh invited me to write an article for Baseball Prospectus. I first sent him this item on the differences between baseball and politics but he said it was too political for them. I then sent him this review of a book on baseball’s greatest fielders but he said they already had someone slotted to review that book. Then I sent him some reflections on the great Bill James and he published it! If anybody out there knows Bill James, please send this on to him: I have some questions at the end that I’m curious about.
Here’s how it begins:
I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that October with about 20 other students, screaming at the TV, “Put Stapleton in!” Unfortunately, John McNamara didn’t hear us, and the rest was history.
I’m much less of a sports fan than I used to be, but the lessons I’ve learned from reading the Baseball Abstracts have done much to form me as a statistician. James doesn’t write much about statistical methods in any general sense–he comes up with what he needs to solve any particular problem–but from his practice one can extract some general principles:
- Methodological pluralism: Rather than try to come up with a single number or a single approach to summarizing player abilities, team strategies, or any other topic, he tried out a bunch of different ideas. In statistics, I like to say that each substantive hypothesis deserves its own analysis: it’s generally hopeless to expect that you can run a single regression and pull off the answers to each of your research questions, one coefficient at a time.
- Controlled comparisons: Instead of comparing simple aggregates, be more careful and make comparisons on pairs or groups of similar players or teams. As economists Rajeev Dehejia and Sadek Wahba demonstrated in a pair of influential articles (they have been cited over 2400 times since their publication a decade ago), these comparisons work only when you are controlling for appropriate characteristics. In the case of Bill James’s analysis, player age is typically a key comparison variable. From the standpoint of applied statistics, controlled comparisons combine the averaging that you get from having a moderate or large sample size with the insight that comes from understanding individual cases.
- Conceptual models used as guides to comparisons: James has written many times that he does not study statistical questions, he studies baseball questions. Each analysis is grounded in some goal. A conceptual model such as the defensive spectrum, or the narrowing of abilities, or the contribution of speed to both offense and defense, drives the direction of the study and motivates many of the details of the analysis. I have tried to follow these principles in my own work.
One central method of statistics that Bill James does not draw upon very often (if at all) is fitting parametric models. For example, James found that the power two in the Pythagorean prediction for wins worked pretty well. He didn’t try to estimate the power from data, nor did he, for example, try to come up with a conclusion such as, “each additional run is worth 0.093 wins.” On the rare occasions that he did estimate a parameter (for example, the relative values of stolen bases and times caught stealing), he buried his methodology and had no interest in making a big deal about the estimation.
Fitting models is something that statisticians are trained to do and in fact do all the time. Why didn’t Bill James follow the example of Pete Palmer and others and try to estimate the relative values of walks, singles, doubles, and other outcomes? . . .
Go to Baseball Prospectus for the rest.