Graph of the year

From blogger Matthew Yglesias:

graphoftheyear.png

There are lots of great graphs all over the web (see, for example, here and here for some snappy pictures of unemployment trends from blogger “Geoff”).

There’s nothing special about Yglesias’s graph. In fact, the reason I’m singling it out as “graph of the year” is because it’s not special.

It’s a display of three numbers, with no subtlety or artistry in its presentation. True, it has some good features:
– Clear title
– Clearly labeled axes
– Vertical axis goes to zero
– The cities are in a sensible order (not, for example, alphabetical)
– The graphs is readable; none of that 3-D “data visualization” crap that looks cool but distances the reader from the numbers being displayed.

What’s impressive about the above graph, what makes it a landmark to me, is that it was made at all. As noted in the text immediately below the image, it’s a display of exactly three numbers which can with little effort be completely presented and explained in three sentences. Personally, I’d prefer a horizontally-aligned dotplot, which can display the information more compactly and readably. And I’d prefer population per acre rather than per square mile. I find it very hard to visualize 60,000 or even 10,000 people in a square mile. In contrast, 15 people per acre is something I can understand immediately. (One could also compute gimmicks such as the average distance to the closest person, if all the people were laid out in city, evenly spaced. I think that sort of calculation can aid intuition, but in this case I think it’s a bit trickier than necessary for the points that Yglesias is making.)

Bill James (and others) have pointed out that true racial equality in baseball came, not when superstars such as Jackie Robinson and Willie Mays started joining major league rosters, but when there was room for ordinary black players to join their equally unexceptional white colleagues on the bench.

Similarly, graphical methods have truly arrived when journalists use graphs to make ordinary, unexceptional points in a clearer way. When making a graph, and including it in an article, is easy enough that it’s done as a matter of course.

P.S. The success of this graph also demolishes naive notions of efficiency of data display. An entire graph is being used to display only three numbers, but there’s nothing chartjunky about it.

26 thoughts on “Graph of the year

  1. Just to be a curmudgeon, I will point out one small flaw in an otherwise elegant graph: the scale is over-labeled. There are seven numbers on the scale, representing just three numbers in the data. I would have kept the ticks at 10k each, but labeled only the 0, 30k and 60k. Or abandoned a scale altogether (completely, banishing the scale, labels, line, ticks and all) and placed three exact numbers on top of the bars. Or to the right of the bars, if horizontal.

  2. Derek:

    I agree. For that matter, I don't think people have much of a sense of the meaning of "10,000 people per square mile. 50,000 people per square mile is 550 square feet per person, but I don't know how helpful that is either.

  3. I'm going to go out on a limb and say that they should display this as density as a fraction of the density of Fargo. People in the US have a sense that Fargo is pretty sparse, so setting that equal to 1 makes it clear how many times more than "sparse" each of the others is. Also, in the caption I would put the reference density so that anyone who wanted absolute numbers could calculate them.

  4. 550 square feet per person is more intuitive, but it goes against Mr. Yglesias' rhetoric (and further, it's unfair to his larger agenda that includes more highrises which provides 3 dimensional square footage -a more fair square feet per person for him would be if all three cities incorporated the available 3D space (at least that not effectively reserved for the wealthy) of Paris projected to DC and Fargo.

  5. Paris doesn't have a lot of highrises. It's full of 7-story buildings. Also it has narrow streets, which frees up a lot of square footage.

  6. Amazing!!

    I would have never ever thought I would write a line like this:

    What do we need a graph for here anyway? A well designed table with only the three numbers would be just perfect. Don't we need the meta information for judging the meaning of the three numbers far more urgently than this graph?

    What about a graph that shows the population density of the districts for each of the three cities – that would add to the story …

  7. I think the numbering is defensible since washington is 10K and Paris is 60K. if washington was 20K OR paris was 70K I could see alternating tics with numbers.

    The was Yglesias does it seems more accessible to the common man's intuition, IMO.

    As propaganda I think it would help him to throw London in there, but propagandistically self-defeating to include almost any other city because of related associations they'd bring.

  8. Martin:

    Sure, if I were making the graph I would include more information. But I'm evaluating this graph, not as an alternative to better graphs that might be put in its place, but in its context as an adjunct to Yglesias's article. The usual "data-ink" reasoning would suggest that a table of three numbers would be fine, but I disagree. I think the graph shows the range of magnitudes much more clearly. What I liked about the graph was how it fit right in with the blog where it was posted. Graph-as-a-routine-part-of-an-essay rather than graph-as-big-deal.

    There's still lots of room for graph-as-big-deal (at least, I hope so!) but here I'm applauding the idea of graphs as a routine tool for journalists.

  9. Yglesias's graph is idiotic and he puts it to a dishonest use. Density is meaningless without data on area. Paris has an area of 40.7 square miles and a population of 2.1 million; DC has an area of 68.3 square miles and a population of 601,000. DC's area is so large due to its creation as a carved-out federal district, not due to any natural urban growth.

    Yglesias argued from the graph that DC would be a more livable city – like Paris! – if it had a much higher population density. But for DC to have the population density of Paris it would have to grow five-fold, to over 3 million residents. Only the very largest cities have densities in the range of that of Paris.

    If you look at "livable" cities of comparable populations to DC – Boston and San Francisco, say – you'll see that their population density is similar to that of DC. San Francisco: 17,323 people/sq mi; Boston, 13,321 people/sq mi. And both these cities have land areas of under 50 square miles, so if you correct for land area, Boston and San Francisco would probably bracket DC. Meanwhile, Edinburgh, which is a lovely, livable city of about 400,000, has a density of under 5,000/sq mi – mainly because its area is over 100 sq miles and thus includes hugh tracts of suburb and semi-rural land.

    Yglesias' chart makes a false comparison and unsurprisingly comes to a false conclusion. Maybe that makes it the chart of the year.

  10. I disagree that there's nothing chartjunky about it. It could be much worse than it is, of course, but the tick lines in the background are unnecessary and, more egregiously, the blocks of blue instead of points or symbols at the appropriate values seem unnecessary, at best. It is plausible that the width of the columns affects the perceived differences, making Paris seem relatively larger than DC than it really is.

  11. …and then there's Manhattan, with a population density of 70,951/sq mi…Paris is busy place, but its intensity pales next to New York County.

  12. If we *really* want to visualize Yglesias' point that "Paris is to DC as DC is to Fargo", we should put the y-axis on a log scale. :)

  13. Andrew:

    I fully agree that it is great if a journalist creates such a clean graph as a default to display and illustrate the underlying data of an article. The claim "graph of the year" though was/is a bit misleading.

  14. The question is whether these are really comparable figures – the administrative district of Paris (also known as Paris intra-muros) is fairly small (and with 2.1m people very dense) and does not include all neighbouring suburbs in the "petite couronne" and "grande couronne" which are administratively independent with a total of 11m people in the metropolitan area

  15. Extending some comments by Bloix: comparisons of city densities are bedevilled by the fact that data are most readily available for various administrative areas, which can bear variable relation to the built-up area in which city people live, even within individual countries.

  16. Bloix — it seems like you're arguing that DC's density seems artificially low because its area is artificially high (which you say explicitly) and therefore that large chunks of it are undeveloped (which you don't say explicitly, but seems to be the point you're making with the comparison with Edinburgh). This isn't the case. DC has a lot of parkland, but the available building space is in general built on — there is very little undeveloped land. Also, the general DC area has a population of well over 3 million. Yglesias's consistent point is that it would be better — environmentally and commercially — if DC found ways for more of those people to live in the District itself rather than in the out-of-District suburbs. You can disagree with his point, but I don't think he's being either idiotic or dishonest.

  17. Bloix,

    If you are going to add Boston to mix then you need to consider the urban area since Boston proper is rather small. How about adding at least Cambridge, Somerville and Brookline. This may or may not help your argument.

    Each of these are separate cities since Massachusetts municipalities tend not to annex. And each of them are considered walkable or have high livability indexes.

    Cambridge has more people who commute by foot than anywhere else in the country.

  18. My graph of the year (and post incorporating graphs of the year) is here:

    http://blogs.discovermagazine.com/gnxp/2010/10/th

    The graph of the year has to be the one in this post starkly showing that no British muslims think homosexual conduct is morally acceptable.

    But the single graph that has most influenced me personally this year is his "toy example" on how birthrates can rise in a general population simply as subpopulations with higher growth rates form a larger share of that population.

    It introduced me to the idea of the toy model, which I hope will be my intro to more graphic blogging and commenting myself. I think toy graphics should be more widely employed, and aren't harmful as long as they're appropriately labeled as such.

    Overall I think the post was brilliant, the best of Razib, and a gold standard of scientific journalism (smarter than what's in the papers, less deformed than what's written by most academic bloggers -less deformed than most of Razib's own writings even).

    Also it was a blog post after written a complete book, which is a virtuous act in a land of sinners (and I'm one of the sinners).

  19. My complaint:
    Fargo, ND
    Washington, DC
    Paris, France

    I live in the state of Washington.
    I really appreciate folks differentiating between my state and the nation's capital. It you mean DC, then say/write DC.

  20. I am with Hyperion except that my complaint is "the nation's capital".

    That could be Paris, France in this example.

    If you mean the USA, please say so.

  21. Hyperion:

    I'm from the D.C. area and am really annoyed when people from Seattle etc say they're from "Washington." If you're talking about Washington state, then say "Washington state."

    Nick:

    "The hation's capital" is lazy jounrnalist-ese, the same sort of thing as saying "the lion's share" for "most." It's not really something anyone says (except in situations where the context is clear).

  22. I think a fair critique of Yglesias' point can be made on the basis of his use of political boundaries to determine density. As others have pointed out, it's arbitrary. When comparing cities as complete entities, it's much better to use a measure like metropolitan area, which captures the whole city as an organism.

    But even that would be flawed for this purpose. Since Yglesias is really trying to make the point that you need density at the urban core to support services and establishments viewed as essential to the urban lifestyle, he should only focus on the density of the urban residential core. DC's density is reduced due to the neighborhoods in Wards 3 and 5 that are completely suburban in housing stock. But having quarter acre lots in the Pallisades shouldn't matter to whether Logan Circle can support another supermarket.

Comments are closed.