I thought I had read on your blog that bar charts should always include zero on the scale, but a search of your blog (or google) didn’t return what I was looking for. Is it considered a best practice to always include zero on the axis for bar charts? Has this been written in a book?

The idea is that the area of the bar represents “how many” or “how much.” The bar has to go down to 0 for that to work. You don’t have to have your y-axis go to zero, but if you want the axis to go anywhere else, don’t use a bar graph, use a line graph. Usually line graphs are better anyway.

I’m sure this is all in a book somewhere.

I agree with Andrew's main point.

But it can be reasonable to have bars starting at some base other than 0. The point is simply that bars should start at a natural reference level, which is not always 0.

I sat through a talk in which sex ratios for various states of India were shown starting at 0. This rather weakened the speaker's main point that sex ratios are typically quite different from unity. Using 1 (or 100%, etc.) as base would have improved that graph. (Using a dot chart would have worked as well or better too.)

Similarly, bar graphs are sometimes useful for time series in which it is important to distinguish periods above or below average, or periods above and below freezing in the US where many people still use 32 deg F for freezing point. In the last case 0 deg F as base would be silly, although I could quote published examples.

In terms of books, Darrell Huff's "How to lie with statistics" is dogmatic about starting at zero. William Cleveland's and Leland Wilkinson's books give more nuanced advice.

The "stats: data and models" has a general comment related to this, if I am not mistaking.

Stephen Few's Now You See It, page 60

Naomi Robbins's Creating More Effective Graphs

It is many books. See pages 239 – 240 of Creating More Effective Graphs by Naomi B. Robbins (Wiley, 2005)or many other books.

"You don't have to have your y-axis go to zero, but if you want the axis to go anywhere else, don't use a bar graph, use a line graph."

…as long as your x-axis is a continuous variable? Or is there no caveat on that?

It's in Tufte ("/The visual display…"). A more nuanced view is in Cleveland (probably "Elements of graphing data", but maybe "Data visualization")

Nick (and others):

Yes. The zero is zero rule is a guideline. Other starting points can work too, but I think it's a good idea to say that you have to justify such alternative choices.

Nick, you say "I sat through a talk in which sex ratios for various states of India were shown starting at 0." If this was a bar plot, then it violates the principle that bar plots should be used (only) for showing "how much" or "how many". They should not be used for ratios.

Phil: Says who? This is to me a distinction without a difference, and a dogma without a rationale (pun intended).

How many females per male is me a matter of "how much" or "how many" too, so the example is consistent with your own principle.

Many quantities can be construed as ratios anyway, even it is in the sense of amount of stuff/unit used to measure stuff. In an old and still much used classification of measurement scales (S.S. Stevens), "ratio scale" is the highest scale of measurement, one in which zeros are not arbitrary, and ratios make sense. Sex ratios make sense.

…line charts are better, only when it's appropriate…distributions (i.e. counts) of categorical data should not be represented with line charts…