Using a “pure infographic” to explore differences between information visualization and statistical graphics

Our discussion on data visualization continues.

One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics.

On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs or how they are trained, and I don’t want to mislabel them) such as Robert Kosara and Jen Lowe, who seem a bit annoyed at how my colleagues and myself seem to follow the Tufte strategy of criticizing what we don’t understand.

And on the third side are many (most?) academic statisticians, econometricians, etc., who don’t understand or respect graphs and seem to think of visualization as a toy that is unrelated to serious science or statistics.

I’m not so interested in the third group right now–I tried to communicate with them in my big articles from 2003 and 2004)–but I am concerned that our dialogue with the graphics experts is not moving forward quite as I’d wished.

I’m not trying to win any arguments here; rather I’m trying to move the discussion away from “good vs. bad” (I know I’ve contributed to that attitude in the past, and I’m sure I’ll do so again) toward a discussion of different goals.

I’ll try to write something more systematic on the topic, but for now I’d like to continue by discussing examples.

My article with Antony had many many examples but we got so involved in the statistical issues of data presentation that I think the main thread of the argument got lost.

For example, Hadley Wickham, creator of the great ggplot2, wrote:

Unfortunately both sides [statisticians and infovgraphics people] seem to be comparing the best of one side with the worst of the other. There are some awful infovis papers that completely ignore utility in the pursuit of aesthetics. There are many awful stat graphics papers that ignore aesthetics in the pursuit of utility (and often fail to achieve that). Neither side is perfect, and it’s a shame that we can’t work more closely together to get the best of both worlds.

I agree about the best of both worlds (and return to this point at the end of the present post). But I don’t agree that we’re comparing to “the worst of the other.” Sure, sometimes this is true (as in the notorious “chartjunk” paper in which pretty graphs are compared to piss-poor plots that violate every principle of visualization and statistical graphics).

But recent web discussions have been about the best, not the worst. In my long article with Unwin, we discussed the “5 best data visualizations of the year”! In our short article, we discuss Florence Nightingale’s spiral graph, which is considered a data visualization classic. And, from the other side, my impression is that infographics gurus are happy to celebrate the best of statistical graphics.

But in this sort of discussion we have to discuss examples we don’t like. There are some infographics that I love love love–for example, Laura and Martin Wattenberg’s Name Voyager, which is on my blogroll and which I’ve often linked to. But I don’t have much to say about these–I consider them to have the best features of statistical graphics.

In much of my recent writing on graphics, I’ve focused on visualizations that have been popular and effective–Wordle is an excellent example here–while not following what I would consider to be good principles of statistical graphics.

When I discuss the failings of Wordle (or of Nightingale’s spiral, or Kosara’s swirl, or this graph), it is not to put them down, but rather to highlight the gap between (a) what these visualizations do (draw attention to a data pattern and engage the viewer both visually and intellectually) and (b) my goal in statistical graphics (to display data patterns, both expected and unexpected). The differences between (a) and (b) are my subject, and a great way to highlight them is to consider examples that are effective as infovis but not as statistical graphics. I would have no problem with Kosara etc. doing the opposite with my favorite statistical graphics: demonstrating that despite their savvy graphical arrangements of comparisons, my graphs don’t always communicate what I’d like them to.

I’m very open to the idea that graphics experts could help me communicate in ways that I didn’t think of, just as I’d hope that graphics experts would accept that even the coolest images and dynamic graphics could be reimagined if the goal is data exploration.

To get back to our exchange with Kosara, I stand firm in my belief that the swirly plot is not such a good way to display time series data–there are more effective ways of understanding periodicity, and no I don’t think this has anything to do with dynamic vs. static graphics or problems with R. As I noted elsewhere, I think the very feature that makes many infographics appear beautiful is that they reveal the expected in an unexpected way, whereas statistical graphics are more about revealing the unexpected (or, as I would put it, checking the fit to data of models which may be explicitly or implicitly formulated. But I don’t want to debate that here. I’ll quarantine a discussion of the display of periodic data to another blog post.

Instead I’d like to discuss a pure infographic that has no quantitative content at all. It’s a display of strategies of Rock Paper Scissors that Nathan Yau featured a couple weeks ago on his blog.

It’s an attractive graphic that conveys some information–but the images have almost nothing to do with the info. It’s really a small bit of content with an attractive design that fills up space.

Difference in perspectives

The graphic in question is titled, “How do I win rock, paper, scissors every time?”, which is completely false. As my literal-minded colleague Kaiser Fung would patiently explain, No, the graph does no tell you how to win the game every time. This is no big deal–it’s nothing but a harmless exaggeration–but it illustrates a difference in perspective. A statistician wouldn’t be caught dead making a knowingly false statement. Conversely, a journalist wouldn’t be caught dead making a boring headline (for example, “Some strategies that might increase your odds in rock paper scissors”).

Who’s right here–the statistician or the journalist? It depends on your goals. I’ll stick with being who I am–but I also recognize that Nathan’s post got 116 comments and who knows how many thousand viewers. In contrast, my post from a few years ago (titled “How to win at rock-paper-scissors,” a bit misleading but much less so than “How to win every time”) had a lot more information and received exactly 6 comments. This is fair enough, I’m not complaining. Visuals are more popular than text, and “popular” isn’t a bad thing. The goal is to communicate, and sacrificing some information for an appealing look is a tradeoff that is often worth it.

Moving forward

Let me conclude with a suggestion that I’ve been making a lot lately. Lead with the pretty graph but then follow up with more information. In this case, Nathan could post the attractive image (and thus sill interest his broad readership and inspire them to those 100+ comments) but set it up so that if you click through you get text (in this case, it’s words not statistical graphs) with more detailed information:

(Sorry about the tiny font; I was having difficulty with the screen shots.)

Again I purposely chose a non-quantitative example to move the discussion away from “How’s the best way to display these data” and focus entirely on the different goals.

17 thoughts on “Using a “pure infographic” to explore differences between information visualization and statistical graphics

  1. I’d like to see you elaborate and strengthen your distinction: “I think the very feature that makes many infographics appear beautiful is that they reveal the expected in an unexpected way, whereas statistical graphics are more about revealing the unexpected (or, as I would put it, checking the fit to data of models which may be explicitly or implicitly formulated).”

    Your distinction seems clear on the surface, but as I think about it more, it becomes less clear that your distinction is correct. For example, I agree that infographics usually seek to display things in unexpected ways, but I don’t agree that they are (usually) made to reveal the expected. And I’d add that one goal of statistical visualization is (often) to be able to visualize multiple dimensions of data and how they interact, including folding dimensions and illuminating manifolds.

    One thought that intrigues me is something I’ve seen in the game of Go: often, a good strategic move is also aesthetically pleasing. To pull the move off at the appropriate time, with the precisely right placement, and the proper followup in mind takes a lot of thinking below the aesthetic surface, but it’s amazing how often a “it looks like that spot needs a stone” is a good beginning to analysis.

  2. You mentioned the not-so-good chartjunk study, but I do have a general question:

    1) Some arguments seem to be long-running ones, and from outside, sometimes differences of opinion due to differences of domain, purposes, etc, that may be implicit.

    2) In a world of data-overload, converting data into compelling visualizations seems increasingly important, whether they are black-and-white print, through interactive 3D, perhaps with haptic feedback interfaces.

    3) But much of this seems like it should involve working with cognitive scientists to do good studies on problems that matter in this turf, and move the boundary between opinion and science in a good direction. It’s been decades since I managed cognitive scientists, but good ones were really valuable. To what extent are such studies happening and might you cite some more?

    If these are not happening, can you hazard guesses as to why not?

  3. Interestingly I was completely with you, agreeing with everything you said, until your example at the very end. I had already read Nathan’s post, and I didn’t love that infographic BUT I did read the whole poster. I opened your text and my eyes read the headline 1, 2, 3, 4, 5, 6 (not the content) and then I got bored and closed it without thinking.

  4. Wayne:

    The swirly plot and the Florence Nightingale plot are examples of graphs that look cool and are visually appealing, assessed by experts to be excellent infographics, but the #1 message that they send is that the data are periodic in obvious ways (7 days a week in the swirly plot, 12 months a year in the Nightingale plot). Actually, the Nightingale plot is even worse because the data don’t actually show any seasonal pattern but the graph invites a seasonal interpretation. I think one thing that people find appealing about these graphs is the Chris Rock effect, that you can look at the graph and be reminded in a striking way of a familiar pattern.

    John:

    I agree that cognitive scientists should be involved. Cleveland in his classic 1985 book, The Elements of Graphing Data, cited some psychological studies. It’s tricky, though, because you have to study the right things. The chartjunk study is an example of an study that’s fine from the standpoint of experimental psychology but is not so useful because the inputs they’re studying are not so interesting.

    I will try do figure out how to collaborate with psychologists to study these things better. In any case, I think the first step is to identify goals. My impression is that statisticians and graphic designers alike tend to implicitly assume there is a single goal to optimize in graphical presentation. I’ve been making a big push to acknowledge that there are multiple, somewhat competing, goals.

    Shubha:

    Yes, that’s why I suggested starting with the image and then clicking thru to the text. But, sure, the text could itself be punched up a bit, I’m sure. I have a lot of interest in rock paper scissors and so was not turned off at all by the tiny text, but if RPS is not your thing then you might not want to read it all.

  5. I’m not really sure what the rock-paper-scissor infographic brings to this discussion – my sense is that neither stat-graphics nor infovis practitioners would think it’s that great. Sure it’s pretty and eye catching, but it doesn’t convey much information. I wonder if you are conflating infovis with infographics?

    • Hadley:

      I used the rock paper scissors example because Nathan Yau linked to it approvingly. I was trying to separate the two functions of the graphic: (1) looking pretty and eye-catching, (2) conveying information. I think that many people, statisticians and graphics experts alike, seem to implicitly think that (1) and (2) go together, and it is by recognizing the difference between (1) and (2)–not automatically thinking that the most informative graphics are generally visually appealing, or that the most visually appealing graphics are informative–that we can move forward.

        • Hadley:

          OK, let me rephrase and say that (a) stat-graphics and infovis people have different preferences on the accessibility/information continuum, and (b) they tend not to emphasize this tradeoff in their discussions. Again consider Kosara’s swirly plot. Whether in static or dynamic form, its lock-in feature dramatizes the weekly periodicity in the data. However, it doesn’t display much more than that. (True, a careful study will show time trends, but these would be much more apparent through a more direct time-series plot and decomposition.) Kosara’s not saying, “Hey, I’m willing to sacrifice some EDA power in order to make a dramatic graph.” Conversely, if I were to plot these data using more conventional statistical tools, I probably wouldn’t have gotten around to saying, “Hey, I’m willing to sacrifice some visual appeal.” Maybe if we start stating these tradeoffs more explicitly (rather than acting as if our favorite styles of graphs are best in all dimensions), we can move forward.

          P.S. Thanks much for keeping this discussion going. One reason Antony and I wrote our paper was to have a chance to publish it in a journal with many discussants from different perspectives.

        • I still think you’re proposing a dichotomy between infovis and stat-graphics that doesn’t really exist, based on a limited sampling of infovis practitioners. The chief difference that I notice between infovis and stat-graphics is that stat people tend to be more in love with data, and infovis (CS) people more in love with programming/code (but this is still a sweeping generalisation). Generally stat-graphics people produce tools for themselves, while infovis people are create tools for others to use. Interestingly, there seems to be less formal validation of methods in stat graphics (i.e. very little user testing) compared to infovis.

        • Hadley:

          Maybe there’s no sharp dichotomy but I think that Kosara is making a tradeoff with his swirly plots and that I was making a tradeoff in the plots in my Red State Blue State book (for example). In each case we were favoring what we like: Kosara likes a striking image with a visual mystery, I like lineplots and scatterplots that focus on comparisons of interest. And i doubt that either of us thought much about the tradeoffs we were making; rather, we each probably were just thinking we were doing the best we could for the problem at hand.

          One reason we can often get away without thinking that hard is that we’re adjacent to people who are doing much worse than we are. If you’re a data graphics expert, you can compare yourself with non-data-based images, secure in the knowledge that you’re displaying a lot more information without sacrificing any visual appeal. If you’re a statistical graphics person such as myself, you can compare your graphs to traditional tabular presentation of data and inference, secure in the knowledge that you’re displaying the information much more clearly without losing any relevant precision.

          To put it another way, I think the infovis people and the statgraphics people are making clear gains. But once you hit the efficient frontier of the beauty/informativeness space, you have to make hard choices. And, here, I think it’s helpful to recognize these tradeoffs.

  6. From an almost-complete outsiders perspective on the matter, while I’m sure the difference is important for professionals and which target demographic the designer going for, in the end the general public (usually the ‘consumers’ of these visualizations) wants something that looks good and makes sense to us. The differences between which is “better” or “more correct” or “more pure” of a visualziation or stat-graphic really has very little importance to the 99% of the world (don’t kill me on this unsourced stat, yes it is a ‘random’ percentage) that you’re actually using the data to represent.

  7. Re: studying the right things, for sure.
    Cleveland’s books are on my closest shelf along with Tuke’s EDA and Tufte’s books.
    Of course, both he and I were at Bell Labs, which employed human factors people of various kinds because it mattered financially, given a million-plus employees. Faster training and less errors saved money.
    Of course, so did making better decisions from masses of data, hence statisticians.
    My lab was about 10% human factors people, including some good cognitive scientists to worry about GUI designs, information navigation, etc.

    We were very close to many real problems. Occasionally, a few related research groups got a bit far off and we had to reject proposed publications – Bell Labs internal reviews were fierce.

  8. a) 116 vs 6 comments for RPS: I dont know how much time passed by since your post but i think times have changed. Posting your RPS article again (maybe w/ enhanced CSS) would produce more than 6 comments. (In fact, i liked your text more than the graphic by Nathan)

    b) Connie Malamed proposes to distinguish between infoposters (e.g. RPS by Nathan) and infographics (which transport “real” information) which i really appreciate as the term infographic is getting abused alot. http://understandinggraphics.com/visualizations/infoposters-are-not-infographics/

    c) In my brain model, there exist 3 types of graphics:
    – statistical graphics which visualize the real underlying data and let the user draw their conclusions out of it.
    – information visualization which visualize the information and propose conclusions to help the user understand
    – infographics which visualize the conclusions mainly and therefore omitting the interesting base data.
    A simple example would be a scatterplot: statistical graphics visualize the datapoints only, infovis adds a color representing the cluster, infographics show the clusters only.

    just my 2 cents.

  9. Pingback: Statistical Graphics and Information Visualization

  10. Pingback: Infovis and statgraphics update update « Statistical Modeling, Causal Inference, and Social Science

  11. Pingback: State of Data #62 « Dr Data's Blog

Comments are closed.