Skip to content

Better late than never

A friend writes:

Does this stuff suck? Or am I missing something?

My reply: Yes, I agree. They all suck (for the purpose of data display).


  1. anon says:

    oh come on! you have to define what you mean by "suck"!

  2. jeremy harris says:

    Very insightful comment, Andrew. Now tell us why the Box Office receipts graphic "sucks for the purpose of data display". I see data presented succinctly.

  3. Andrew Gelman says:

    Anon: My short answer is that they look cool but I don't think they do a particularly good job at conveying information. Many of their good-looking features are more of a distraction, I think.

    To go through them one by one:

    Wordle: The words shown are in no particular order, in a mix of color shadings, and some are horizontal and some are vertical. Also the entire display has a goofy appearance that, again, conveys no information.

    Decision Tree: This picture falsely implies (a) that voters were divided between Obama and Clinton in some sort of precise way, and (b) that this tree indicates the way that voters make decisions. It's an instance of taking a fitted model way too seriously.

    Music Video: It's pretty, I just don't think it's displaying the data well. I don't call this sort of thing "data visualization"–rather, it's a use of data visualization tools to make art, which is great, but not the same thing at all, in my opinion.

    Box Office Streamgraphs: These look super-cool but, no, I don't think they provide a good way of looking at the data. Compare to the Baby Name Wizard for a similar plotting scheme that works much better. Again, the graphs are striking–and it may very well be that it's worth paying the price (making a graph that does a poor job of conveying information) in order to get the attention. After all, if people don't see your graph, who cares how clear it is?

    I Want You to Want Me: Cute, but, again, I don't really see it as an effective way to convey the data, it's more of a way to get attention, but then I'd want a pointer toward a better data visualization to learn more.

    Britain From Above: These are actually pretty good. They could be improved–most notably by removing the background music and adding a visual clock on the screen so you can get a sense of the time scale of the data display.

    In summary: some of these are visually attractive, and the only ones I really hate are Wordle and the Decision Tree, but none of them come close, in my opinion, to the Baby Name Wizard.

  4. Simon says:

    Aren't the Box Office Streamgraphs and the Baby Name Wizard exactly the same type of graph? They are both stacked area plots with quantity (births/box office receipts) on the vertical axis, and time on the horizontal axis.

  5. As a general rule, a data visualization should not introduce more complexity than already exists in the data – without good justification.

    In that respect, your criticisms are valid, with one exception: streamgraphs.

    All of the visual elements of Lee Byron's variant of stacked graphs encode data, and there's a whitepaper — — that describes this.

  6. David Smith says:

    I'll add to the chorus that streamgraphs are a great way of representing changes in categorical distributions over time. The New York Times box-office chart is excellent, IMO — particularly the use of smoothing (which does munge the raw data a bit, but offers much greater interpretability). The interactivity (visible only at the original NYT URL) is also great (although the box-office data would have been more interesting than the reviews for the pop-ups).

  7. Doug says:

    I agree with Andrew. I'm seldom impressed with stuff on that blog. It's one thing to do cool-looking stuff with data, and these are great for a non-technical user who isn't doing anything important with the data, but for an analyst who is trying to model the data, most of this stuff is garbage.

  8. Andrew Gelman says:

    I wouldn't go so far as to call it "garbage": it's important for people to be developing the technical tools and displaying them to a wide audience. Ultimately, I hope that the graphical community will be able to distinguish between the simple beauty of the Baby Name Wizard and what I see as the gimmickiness of the Streamgraph. But one way to get from here to there is for all these ideas to be put out there.

  9. RRR says:

    I am kind of surprised to see these comments, especially since so much effort has been made over the past decade to make Statistics appear powerful and even fun to the general public; to overcome the stigma of "lies, damned lies, and statistics."

    I am not an artsy person, and some of the visualizations on that site are nothing but artsy, but most of them accomplish one strong goal: to connect the reader with the data. By adding colors and making the graphics look "cool," it engages the reader with the data and integrates the context of the data with the graphic, without resorting to boring boxplots, histograms etc.

    Of course, for scholars these images are not appropriate, and do not convey all of the information in the data, and perhaps may even be a tad misleading, but the audience (of the original graphics) consists of people with no training in data analysis.

  10. Alex F says:

    Today, from Flowingdata, possibly the worst, least informative, most gimmicky presentation of data that I've ever seen:

  11. @AlexF – The Circos graphs are highly useful in their original domain: describing gene conservation in a circular bacterial chromosome (in addition to allowing for other annotations). And I applaud Martin Krzywinski for his contributions here.

    In the case of tabular data, though, it feels as if the form has been stretched to a breaking point – a hammer in search of a nail.

    And rainbow color palettes confound more than they clarify for ordered, continuous data (like percentages).

  12. David says:

    Doug – This is a pretty ignorant statement. I like the FlowingData blog. Of course you aren't going to use these types of visualizations in modeling or analysis, that's not the point. Eventually though, you have to convey the results to an audience (peers, wider public, scholars), and these visualizations are about the best way (or not the best way) to convey this information so that it tells the story of the data. I also think this statement "who isn't doing anything important with the data" is blasé and dangerous. These visualizations influence many people (how many people read and the way people think about different topics (and people are voters). Sites like FlowingData and Chart Junk are valuable because they try and reach a wider audience and challenge when a visualization distorts the data. No not every visualization shown on the site works for me, but to call it garbage is ridiculous.

  13. ekzept says:

    Well, there just may be other ways of thinking about data than the immediately crisp and authoritative. There are notions and ideas and trends which escape imaging that can be often understood by direct experience. Sure, the formal is supremely important. But often a concept and experience is needed to start that, bringing in other senses and emotions.

    Can't harm to have as many personal resources experience data to my mind, even if judgments need to be guided by critical rationality.