Skip to content

Update on marathon statistics

Frank Hansen updates his story and writes:

Here is a link to the new stuff. The update is a little less than half way down the page.

1. used display() instead of summary()

2. include a proxy for [non] newbies — whether I can find their name in a previous Chicago Marathon.

3. graph actual pace vs. fitted pace (color code newbie proxy)

4. estimate the model separately for newbies and non-newbies.

some incidental discussion of sd of errors.

There are a few things unfinished but I have to get to bed, I’m running the 2010 Chicago Half tomorrow morning, and they moved the start up from 7:30 to 7:00 because it’s the day of the Bears home opener too.


  1. jme says:


    The fitted vs. actual plot suggests that it might be worth considering an error variance that's a function of pace.

    I realize Frank mentions that he's "duplicating" runners and not using any pooling by treating each short race/marathon pair as a separate case, but I wonder how much of the relatively large sd of the residuals is a result of this decision. All sorts of other "simple" things you could do like just picking the short race with the best pace, or the average, or more sophisticated pooling strategies.

  2. kbob says:

    I wonder if it would be worth supplementing with data from a small-city marathon (& associated short races) that isn't a destination for people seeking to qualify for bigger races. An added benefit is that it might be possible to make better matches and get better "career" data, albeit at the cost of smaller n's.

    I was going to suggest the Equinox marathon in Fairbanks, but Alaskans are notoriously unrepresentative of, well, anyone except other Alaskans. Nevertheless, if you're interested, some records are here:… If you scroll to the bottom, you'll see lists of multi-year results for the best runners and the best runners never to have won the race.