## Beautiful Graphs for Baseball Strike-Count Performance

This post is by Bob. I have no idea what Andrew will make of these graphs; I’ve been hoping to gather enough comments from him to code up a ggplot theme. Shravan, you can move along, there’s nothing here but baseball.

Jim Albert created some great graphs for strike-count performance in a series of two blog posts. Here’s the first post:

* Jim Albert. Graphing pitch count effects (part 1)

Albert plots the pooled estimate of expected runs arising at various strike counts for all plate appearances for the 2011 season.

Using the x axis for count progression and the y axis for outcome yields a really nice visualization of strike count effects. I might have used arrows to really stress the state-space transitions. I might also have tried to label how often each of these transitions happened (including self-transitions at 0-2, 1-2, and 2-2 counts) and the total number of times each state came up for a batter. I don’t think we really need the double coding (color and vertical position), and indeed the coloring’s gone in the second plot.

I like the red line at the average effect for an at-bat (corresponding to a 0-0 count), but I would’ve preferred the actual expected runs on the y axis rather than something standardized to zero. The average plate appearance is worth more than 0 runs.

In the second post, Albert goes on to plot estimates by batter and pitcher without any pooling:

* Jim Albert. Graphing pitch count effects (part 2)

The following are plots for the 2015 Cy Young award winning pitchers:

He also included a couple of great batters from the same year:

With multiple graphs, it’d be nice to have the same y axis range and the same ratio of x axis to y axis size (I can never remember how to do this in ggplot). And it’d be nice to set it up so that the little bubbles don’t get truncated.

It dawned on me that the transitions are not Markovian. If a pitcher intentionally walks a batter from a 0-0 count, then the transitions from 0-0 to 1-0 to 2-0 to 3-0 are correlated. So I’d bet there are more such straight-through transitions for many batters than would be expected from Markovian transitions. We could test how well the Markovian approximation works through simulation, as Albert does for many other topics in his book Curve Ball.

Of course, the natural next step is to build a hierarchical model to partially pool the ball-strike count effects. An even more ambitious goal would be to model a particular batter vs. pitcher matchup.

I highly recommend clicking through to the original posts if you like baseball; there are many more players illustrated and much more in-depth baseball analysis.

1. Zach Shahn says:

Great graphs. Clicking through to the post, though, I do have to quibble with Albert’s causal interpretation of the graphs, or at least the causal language he uses to describe the findings. These ‘effects’ could easily be confounded. For example, pitchers tend to want to pitch around great hitters, and runs are more likely to be scored with great hitters at the plate. This would lead to an association between balls and run expectation that is not causal. Pitching balls to these great hitters could actually cause lower run expectation. I’m not saying I think that confounding by hitter quality actually explains the associations in the graph, just that these unadjusted associations are not the causal effects of balls and strikes on run expectation at various counts and it would be interesting to adjust for confounding.

There’s also the issue that all subjects do not receive the same version of treatment, violating part of SUTVA. The causal effect of a ball that is a high and inside fastball designed to set up a low and away changeup on the following pitch is surely not the same as the causal effect of a fastball that just happened to miss the outside corner. This complicates the problem of defining what we would even mean by ‘the causal effect of throwing a ball in a 1-1 count’.

• Andrew says:

Zach:

Interesting comment. But what I really want to say, after coaching a few games of Little League, is I just love the idea of pitchers who can intentionally throw low and away. We’re just thrilled when our guys can keep it below the neck and above the dirt, and not too far away from the plate.

• Phil says:

In his classic baseball diary “Ball Four,” 1960s pitcher Jim Bouton wrote that when he was a young pitcher he was disappointed because although he threw hard he didn’t have pinpoint control, but when he got to AAA he discovered that if you could pick low/medium/high and left/middle/right and get the pitch there most of the time, you were said to have “pinpoint control”, and he had it. (I forget how he said it, but that was roughly it).

And somehow this puts me in mind of this joke: When a mathematician says x = y, she means they are exactly equal. When a physicist says x = y, she means they’re within about 30% of each other. When a cosmologist says x = y, she means they’re in the same units.

2. Andrew says:

Bob:

OK, here are my comments how the graph can be improved:

1. As you suggest, put little arrows at the end of each segment, and make the width of the segment proportional to the number of pitches that correspond to that path.

2. Make the labels (0-0, 1-0, etc) larger. I have to squint to read them. Similar, increase the sizes of the numbers on the axes.

3. Bizarrely, there are vertical lines at the half-pitch marks (0, 0.5, 1, 1.5, etc). Pitches are discrete so these lines should only be at the integers. Actually I’d get rid of all those extra horizontal and vertical lines as I find them distracting.

4. Remove the color key on the right side of the graph. When I first saw the graph it seemed like so much was going on, with colors and little numbers and lines. If it was just the lines with arrows and big pitch counts, I think that would be much easier to read.

• Andrew says:

P.S. Just to be clear, I loved those graphs. Loved them. I think we can make them even better but they’re already great.

3. Turgid Jacobian says:

It would be interesting to see how the expected runs look in absolute pitch number as well as the count, I think.

4. Jonathan (another one) says:

On behalf of Shravan, I protest.

5. Joey says:

I think you want `coord_fixed`. Something like `p + coord_fixed(ylim = c(-0.1, 0.25), ratio = 0.05)`

6. justme says:

I also really like these graphs. The first thing I thought of when I saw them was that I really want someone to produce multiple versions of each over time, for different seasons, and create an animation to see if/how they have changed over time. Could probably whip something up using gganimate pretty quickly.