This is a mini research note, not deserving of a paper, but perhaps useful to others. It reinvents what has already appeared on this blog.

Let’s say we have a line chart with numbers between 152.134 and 210.823, with the mean of 183.463. How should we label the chart with about 3 tics? Perhaps 152.132, 181.4785 and 210.823? Don’t do it!

Objective is to fit about 3-7 tics at the optimal level of rounding. I use the following sequence:

**decimal rounding**: fitting integer*power*and single-digit decimal*i*, rounding to*i** 10^*power*(example: 100 200 300)**binary**having*power*, fitting single-digit decimal*i*and binary*b*, rounding to 2**i*/(1+*b*) * 10^*power*(150 200 250)- (optional)
**quaternary**having*power*, fitting single-digit decimal*i*and quaternary*q*(0,1,2,3) round to 4**i*/(1+*q*) * 10^*power*(150 175 200) **quinary**having*power*, fitting single-digit decimal*i*and quinary*f*(0,1,2,3,4) round to 5**i*/(1+*f*) * 10^*power*(160 180 200)

Particularly interesting numbers that would act as a reference can be included. Rounding can be adapted to ensure sufficient spacing between labels. This rounding reduces the cognitive cost of interpretation and memorization of a chart, along with the linguistic cost of communication of findings.

Another application of rounding is communication of measurement tolerance or prediction error. For example, if I tell you that the width is 37.3434 mm, I’m indicating that the measurement was very precise. But if I’m not so accurate, telling you that my measurement was 50mm indicates binary rounding, with the truth being somewhere between 25-75mm. Telling you it was 75mm indicates quaternary rounding with the truth being somewhere between 60 and 90. If I told you it was 80, you’d know the truth is somewhere between 70 and 90. If I told you it was 85, well, then the ’5′ is subject to binary, quaternary or quinary rounding at the last digit.

If the plot is nonlinear, one can use exponential rounding to 10^*i* (10 100 1000).

[Edit 10/3/2011] Added a link kindly provided by Brian Diggs.

I think including 4 is not such a good idea; it requires two “extra” digits to represent, where 2 and 5 only require one.

Some more discussion about this, including links to papers about it and comments about how R approaches the problem, are in a 2010 post from this blog: http://andrewgelman.com/2010/03/pretty_aint_alw/

log_breaks in Hadley Wickham’s scales package (https://github.com/hadley/scales/blob/master/R/breaks.r#L28) implements something similar for the logarithmic scale.

I don’t see how “telling you that my measurement was 50mm indicates binary rounding”. If you report 50mm, I will assume an accuracy of 1mm. Of course, if you show me a list of measurements, which are all 10-, 25-, or 50-folds, I will assume a larger tolerance accordingly, but shouldn’t the results still have been rounded more? (Obviously, this comment does not apply to your main topic: axis labels.)

Wikipedia-trivium: To report 50km and indicate 10km tolerance, you could call it 5 myriameter (myria = 10^4).

Sylvia, right, one doesn’t know if rounding is being used or not, and we second guess. The scientific notation sort of allowed it: 5 * 10^1 (5e1) vs 5.0 * 10^1 (5.0e1) Need to find a nicer way of presenting this though.