## World record running times vs. distance

Julyan Arbel plots world record running times vs. distance (on the log-log scale):

The line has a slope of 1.1. I think it would be clearer to plot speed vs. distance—then you’d get a slope of -0.1, and the numbers would be more directly interpretable.

Indeed, this paper by Sandra Savaglio and Vincenzo Carbone (referred to in the comments on Julyan’s blog) plots speed vs. time. Graphing by speed gives more resolution:

The upper-left graph in the grid corresponds to the human running records plotted by Arbel. It’s funny that Arbel sees only one line whereas Savaglio and Carbone see two—but if you remove the 100m record at one end and the 100km at the other end, you can see two lines in Arbel’s graph as well. The bottom two graphs show swimming records. Knut would probably have something to say about all this.

1. Brent Buckner says:

I think most folks who participate in or observe endurance events are used to thinking in terms of times and distances, so would like the existing plot more than they would like speed versus distance.
(e.g. I know my best half-marathon and marathon times but couldn’t tell you the corresponding paces off the top of my head; similarly for world records).

2. Jon Peltier says:

This is exactly analogous to metal fatigue, where lifetime is related to cyclic stress (force) or strain (displacement) through a power law relationship. Other failure mechanisms follow similar, if less-well-defined relationships.

3. Jim says:

I’m missing what’s new here? Isn’t this well studied in exercise physiology? I am out of my area of expertise, but I won’t let that stop me.

Instead of running why not use cycling where the energy expenditure (and therefore the physiological stress) is routinely and precisey measured with power meters? Then the relation could get even better resolution. Do a regression of power vs. time and work backwards to derive the aerobic beta of Savaglio and Carbone from Monod and Scherrer’s critical power model: http://www.tandfonline.com/doi/abs/10.1080/00140136508930810. See it in personal records rather than world records.

In bike racing we calculate our training zones from “functional threshold power” which is roughly the same as critical power. You can even find online calculators: http://www.cyclingpowermodels.com/MonodCriticalPower.aspx

• Andrew says:

Jim:

I never claimed this was new. I just like the graphs. It was new to me, and I thought it might interest others, hence the blog post.

• Jim says:

I assumed (incorrectly?) it was presented by Nature as new research.

4. Jennie Dusheck says:

I like the speed v. time better as well. It appears to show that people can sprint for about 100 seconds.

It would be interesting to see similar graphs for other animals. Is this ~100s limit a universal physiological limit (a result of the ways our cells generate anaerobic energy?) or is the limit different in different species?

• Jim says:

Yes. After ~2 mins all-out, you’ve completely used up the energy of both your phosphogen (5-15 sec) and your lactic acid/glycolytic systems, and energy can olny be delivered to muscles through aerobic metabolism.

5. Jeremy Fox says:

It seems plausible to me that the 100 km WR might be a little “soft”, which is why it falls a bit above the line extrapolated from other long distance WRs on the Arbel plot. The pool of people who attempt 100 km is much smaller than the pool of people who attempt marathon and sub-marathon distances. The world’s best ultramarathoners may actually be people who’ve never attempted an ultramarathon.

It seems less plausible to me that the 100 m WR would also be a little “soft” compared to what you’d predict by extrapolating the WRs from other shorter distances. Looking at the Arbel plot, I wonder if the 200 m WR isn’t actually a slight outlier on the low side, rather than the 100 m WR being a slight outlier on the high side.

• Paul says:

The ‘soft’ 100m WR in comparison to the 200m WR is due to the time required to accelerate to full speed from a stationary start.

6. Epanechnikov says:

See however

http://arxiv.org/pdf/0706.1062v2

(Lots of distributions give you straight-ish lines on a log-log plot)

• Andrew says:

Epanechnikov:

The linked paper is great, but I think it’s a different topic. Clauset et al. are talking about the probability distribution of a single variable, whereas the graphs above are plots of y vs. x for two variables. The plots look the same but, as far as I can tell, the problems are completely different. (I’m open to clarification on this, though.)

7. Power-law regressions are indeed a separate problem from power-law distributions, calling for quite different methods. I have seen claims that when the noise in the response is additive, using nonlinear least squares is much more efficient than log-transforming both variables, but haven’t looked into that deeply. I have looked at some examples of power-law regressions where there is really very little evidence in favor of the power law as opposed to, say, a logistic response curve.

8. (hit post too soon)
More broadly, though, Andrew’s right that the specific techniques in our paper don’t apply here.

9. Nice plots. Because I wanted to play around with the data a bit more and potentially also use it for a classroom example, I extended the original Abel data somewhat. In addition to distance and time, I have included gender (and collected female records as well), whether or not the distance is included in the olympic events, the type of race (track vs. road), athlete’s name, and date of record. All data was taken from the current “List of world records in athletics” Wikipedia page (see http://en.wikipedia.org/w/index.php?title=List_of_world_records_in_athletics&oldid=459175337). The data are provided below in CSV format (I hope the line breaks are preserved in this post…). And I also included a few very simple ideas for a first analysis in R. Hopefully, other readers will enjoy playing around with this as well.

— Run2011.csv —

distance,olympic,time,gender,type,name,date
100,yes,9.58,male,track,Usain Bolt,2009-08-16
200,yes,19.19,male,track,Usain Bolt,2009-08-20
400,yes,43.18,male,track,Michael Johnson,1999-08-26
800,yes,101.01,male,track,David Rudisha,2010-08-29
1000,no,131.96,male,track,Noah Ngeny,1999-09-05
1500,yes,206.00,male,track,Hicham El Guerrouj,1998-07-14
1609.344,no,223.13,male,track,Hicham El Guerrouj,1999-07-07
2000,no,284.79,male,track,Hicham El Guerrouj,1999-09-07
3000,no,440.67,male,track,Daniel Komen,1996-09-01
5000,yes,757.35,male,track,Kenenisa Bekele,2004-05-31
10000,yes,1577.53,male,track,Kenenisa Bekele,2005-08-26
20000,no,3386,male,track,Haile Gebrselassie,2007-06-27
25000,no,4345.4,male,track,Moses Mosop,2011-06-03
30000,no,5207.4,male,track,Moses Mosop,2011-06-03
100,yes,10.49,female,track,Florence Griffith-Joyner,1988-07-16
200,yes,21.34,female,track,Florence Griffith-Joyner,1988-09-29
400,yes,47.60,female,track,Marita Koch,1985-10-06
800,yes,113.28,female,track,Jarmila Kratochvilova,1983-07-26
1000,no,148.98,female,track,Svetlana Masterkova,1996-08-23
1500,yes,230.46,female,track,Qu Yunxia,1993-09-11
1609.344,no,252.56,female,track,Svetlana Masterkova,1996-08-14
2000,no,325.36,female,track,Sonia O’Sullivan,1994-07-08
3000,no,486.11,female,track,Wang Junxia,1993-09-13
5000,yes,851.15,female,track,Tirunesh Dibaba,2008-06-06
10000,yes,1771.78,female,track,Wang Junxia,1993-09-08
20000,no,3926.60,female,track,Tegla Loroupe,2000-09-03
25000,no,5225.84,female,track,Tegla Loroupe,2002-09-21
30000,no,6350,female,track,Tegla Loroupe,2003-06-07

—Run2011.R —

Run2011 <- transform(Run2011,
date = as.Date(date),
age = as.numeric(Sys.Date() – as.Date(date)) / 365.25,
speed = 3.6 * distance/time)

## plot full data
library("lattice")
xyplot(log10(speed) ~ log10(distance), groups = ~ gender, data = Run2011)

## similar to Savaglio and Carbone
panel_smooth <- function(x, y) {
panel.xyplot(x, y)
panel.loess(x, y, span = 1)
}
xyplot(log10(speed) ~ log10(time) | gender, data = Run2011,
subset = distance %in% c(200, 400, 800, 1000, 1500, 1609.344, 3000, 5000, 10000, 42195),
panel = panel_smooth)

## first regression analysis (ignoring the changes in distance coefficient)
m 400 & distance < 100000)
summary(m)

10. Xi'an says:

Actually, Julyan has nicer graphs on a more recent post of his. It evaluates the density of the joint variable (difference from average time per km on the nth first km, rank in the category) using SAS KDE proc…

• Julyan says:

Xian, these ones are Jérôme’s :) I don’t know SAS actually, what is it?

11. Julyan says:

Merci Andrew pour votre intérêt! Thanks for pointing out the different slopes, I’ve worked out why it was less visible on my plot: powers are more sensitive in a speed vs. time than in a time vs. distance plane. Details here http://statisfaction.wordpress.com/2011/11/16/power-laws-choose-your-x-and-y-variables-carefully/

12. Tom says:

What about looking at the relationship between time, distance, speed, gender and age? For instance, the NYC Marathon posts the best times by age and gender. There must be similar sites for other events, maybe the Wikipedia site posted above has it. Where is the kink in the performance curve by age? How different are the best times that humans can produce by age by event? How different are men and women by age?

13. Daniele says:

This may be of some interest to test cycling and power:
http://connect.garmin.com/activity/116268248

14. Sandra Savaglio says:

Here some more analysis we have done, considering longer distances and gender differences:

http://www.mpe.mpg.de/~savaglio/Sports%20Science_files/scaling_law.pdf

Sandra