## Graphing the Kentucky Derby?

Speaking of racetrack charts, did anybody make a graph of the positions of the horses over time during the recent Kentucky Derby?

I’m thinking it could be done on a very long 2-d strip (imagining the racecourse laid out as a long strip from beginning to end, with width corresponding to the width of the track), with a different color for each horse–maybe using solid lines for the top 6 finishers and light lines for the others. Also, maybe the positions of the horses every 5 seconds (say) could be connected with light gray lines–then you’d be able to see who was ahead at any given point in time.

Does this graph already exist somewhere? Or are there better ideas?

1. I'm having troubles picturing what you want – position on the x axis and time on y axis? Time on the x axis and rank on the y axis?

2. Bob Lawless says:

I don't know about horse racing, but a very similar concept is used in motorsports. A lap chart shows the relative position of the competitors at the end of each lap, although it does not capture the magnitude of the difference in positions. In other words, it does not distinguish between a competitor who is one second behind versus one who is ten seconds behind.

An example is here: http://www.fia.com/EN-GB/SPORT/CHAMPIONSHIPS/F1/B… The better lap charts that I have seen do exactly what you propose by highlighting the top 6 or 8 finishers.

3. Jon Peltier says:

You want a bumps chart (see here and here. The race goes left to right, and the horses rank from first place (top) to last (bottom).

4. Andrew Gelman says:

Hadley: The graph would have a small height and a very large length. x-position would be position along the track–actually, the "virtual track" including all the laps. y-position would be position across the track (from the inside rail to the outside rail). Time would be indicated by the light gray lines that would connect the horses at their position at any given time.

Bob: Yes, but I'm imagining something more continuous, less discrete-looking.

Jon: No, the chart you showed is not what I'm looking for. I want the lines to show the horses' actual trajectories, not their rankings.

It's funny how difficult it is to describe these things in words. I'd make a demonstration picture, except that this takes time, and there's no real reason for me to be doing this at all.

5. Phil says:

If I may: Imagine taking an overhead still photo of a portion of the track (a long enough portion to show all of the horses). I do this every ten seconds. I put the photos end to end. I think that's what Andrew wants — I speak subject to correction — except instead of photos, he wants a different identifying icon for each horse. You'll see some horses move from the rail to the outside, you'll see some of them move ahead and then drop back, and so on.

Or, hey, how's this for an idea: instead of time (or distance) being along the x-axis, use an animation. And instead of icons, use animated pictures of horses — perhaps even photo-realistic ones. Boy, if you ran that so that the whole thing took about two minutes, I bet it could be really exciting. Oh, wait, somebody's already done it, you can see it here.

6. David says:

Let me repeat back to you what I think you want to illustrate [bear with me]. What you really want is a three-dimensional plot, (t, f_1(t), f_2(t)), where f_1 is progression 'down' the track and f_2 is movement across the track. [So what you want to know is 'instantaneous' velocity and total distance travelled?]

I gotta believe somebody has done this. Analyzing this with a sim is the first thing a computer enabled bettor should do. If not, The Daily Racing Form is probably the first place to ask about finding the data to make the plots. You may be able to get NBC to give you the video for analysis [you're in NYC, maybe you can just go downtown]. The advantage there is that they have precision timing available.

7. Ah, ok. In ggplot2, it would be something like:

<pre>ggplot(race, aes(x = along, y = across, colour = horse) +
geom_line() +
geom_line(aes(group = time), colour = "grey50", size = 0.5) +
geom_point() </pre>

i.e. you have a race dataset with variables along (total distance travelled), across (distance from inside track), time (time measurement taken), and horse (horse identifier).

You map x position on the plot to along, y position to across, and colour to horse (or only to the best few horses). You then draw a line for each horse, and a thin grey line for each time point, and top it off with points for each measurement. Does that sound about right?

If you wanted the bump chart (aka time series plot of ranks), you could do:

<pre>horse <- ddply(horse, "time", transform, place = rank(along))
ggplot(horse, aes(x = time, y = place, colour = horse)) +
geom_line()</pre>

i.e. calculate the place at each time point based on the horse's position along the track, and then draw a time series plot with rank on the y-axis.
Here's hoping someone can provide this data!

A graphical grammar makes it much easier to express statistical graphics compared to English.

8. Andrew Gelman says:

OK, OK, I'll learn ggplot2 already.

9. Kieran says:

No, no, the rest of us need all the edge we can get.

I can't be the only one who has to think twice before giving in and enthusiastically recommending ggplot to people.

10. Andrew Gelman says:

Well, I can only assume that there's a ggplot3 out there that you and Hadley aren't telling me about.

11. Andrew:

While I know you didn't want a bumps chart, I made one to see how it would "&gt <a href="http://;http://chartsgraphs.wordpress.com/2009/05/05/2009-kentucky-derby-bumps-chart/"&gt;” target=”_blank”>;http://chartsgraphs.wordpress.com/2009/05/05/2009-kentucky-derby-bumps-chart/"&gt; look .

A real racing fan commented about TrakUs, a system that does what I think you want. Unfortunately, it's not available for the Kentucky Derby. Here's a link to a short video that shows TrakUs for a horse race.