Skip to content
 

Infovis and statgraphics update update

To continue our discussion from last week, consider three positions regarding the display of information:

(a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research.

(b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible.

(c) Information visualization or infographics, as performed by graphics designers and statisticians who are particularly interested in the way that graphics can involve non-statisticians in the process of thinking about and understanding numerical data.

I place myself in group (b) but I recognize that a lot of smart and accomplished researchers find themselves in (a) or (c). From about 1995-2010, most of my writing on graphics focused on the contrast between (a) and (b), in particular the way that exploratory graphics could be used to check the fit of and better understand probability models.

In the past year or so, though, I’ve been writing a lot about the distinction between (b) and (c). It’s been a struggle, as many of the infographics people seem to feel that I’m putting them down. But I’m not! Despite what Nathan Yau says, I don’t think it’s a putdown of Florence Nightingale’s graph to compare it to a Chris Rock comedy routine. I think Chris Rock is great.

OK. I’ll try one more time, as I haven’t done such a great job communicating this all so far.

I’ll start with an example where these differences in perspective can make a big difference.

Why does this matter?

The differences between statistical graphics and information visualization are important because, on both sides, people are doing work that’s not as good as it could be.

I’m guilty of this too! As can be seen from our book Red State Blue State. My coauthors and I spent a huge amount of effort figuring out how to display the data and our inferences graphically. And I think we did a good job. But . . . we really could’ve benefited from collaboration with an infovis expert. Our graphs were solid but not visually innovative. They were not exciting. They were fine for the readers who already cared about the data but not enough to grab the casual reader.

And this was a failing on our part! We wrote the book because we felt the topic was important and we wanted to engage journalists and the general public. To have all these great analyses and to display them so boringly . . . that’s a wasted opportunity. It would be like printing the entire book in an ugly font: sure, you could read it if you really needed to, but it puts an unnecessary burden on the potential reader.

In my ignorance of the different perspectives of infographics and statistical visualization, I naively thought that the best images for the statistical purpose of my understanding of the data would automatically be the best for involving others. But I’m pretty sure I was wrong. We had well over 100 graphs in our book and we surely could’ve spared some space for some attractive and inviting infographics.

From the other direction, I’ve often enough seen snappy infographics that can mislead because of underlying statistical problems. The usual issue is that a non-statistical infographic can garble the data enough that it takes a fair amount of reader effort to decipher the most basic patterns. Thus, a grabby infographic can sometimes convey the complexity of a dataset and draw the reader in further, but at some point it can make sense to make the handoff to a statistical graphic that allows the reader to make comparisons of interest more directly (as discussed by Cleveland in his 1985 book). Again, I think a lost opportunity arises because of people not realizing that the best tool for one purpose can fall short with respect to other goals.

Aiming for common ground

I’ve noticed that different sorts of graphics-makers have different sorts of preferences:

1. Statistical graphics people detest pie charts and tend to like plain-looking displays such as dotplots and lineplots, with even more complicated varieties such as mosaic plots looking fairly uniform from a visual perspective. Statisticians seem to care a lot about displaying data optimally but not much about what people actually learn from their graphs in real life.

2. In contrast, infovis experts tend to like striking, unusual patterns and favor graphs with visual appeal. More than once I’ve seen infovis people express a soft spot for pie charts–I’m never sure if it’s because they think pie charts are an effective way to display data or if they’re just reacting to what they perceive as an annoying prescriptivism by statisticians such as Bill Cleveland and graphics gurus such as Ed Tufte.

Differences in taste

A point where I think we can all agree is that there are some systematic differences in taste.

On one hand you have people like Cleveland, Tufte, Antony Unwin, Kaiser Fung, myself, and many other statisticians who want graphs to be transparent so that users can identify what each data point on the graph stands for.

On the other hand you have people like Robert Kosara, Nathan Yau, and the many thousands (hundreds of thousands?) of satisfied users of Wordle who want data images that can tell an effective story and find communication to be much more compelling than any abstract principles about data-ink.

And somewhere on the side are the tens of millions of maybe not-so-satisfied users of Excel who are fumbling to display today’s data using last century’s tools, making graphs that hit that sweet spot of being both ugly and barely informative! (But not always; sometimes I like a plain simple barplot.)

Points in common

Infovis and statgraphics people have a lot in common, and maybe I should spend more time emphasizing that. I’ve recently been talking a bit about the two dimensions of attractiveness and informativeness and how we stand at different points on the efficient frontier of this space, but in practice one can often put in a bit of effort and improve in both directions. In other cases a reexpression can reveal different dimensions in a dataset. For example, consider our discussion of these time-use clock graphs. My commenters and I suggested various alternatives; for some purposes the original graphs might be fine; the key point, though, which we should all be able to agree on, is that there are a lot of things to look for here and we shouldn’t be stuck in any single form of visual data expression.

Different tastes, different goals

The main point I’ve been trying to get at in the recent discussion, starting with my paper with Unwin, is that it makes sense to consider the different tastes of statgraphics and infovis people as reflecting different goals.

It’s not just that we statisticians are too lame to make graphs that look good, or that graphics designers are too clueless to display actual data. Rather, we develop different expertises partly to serve our different goals. As a statistician, I’m particularly interested in assessing the fit of my models to my data, and so I work on graphs that allow comparisons of individual data points and give me a sense of my fitted models. In contrast, graphics designers are always being made aware of the problems of getting the attention of outsiders, hence they develop tools to make graphs more impressive visually appealing.

Putting it all together

Infovis people have a lot of knowledge that statisticians don’t have, ranging from technical issues of fonts and colors to a general user-focused visual and storytelling perspective. And statisticians can offer a lot from the other direction, ranging from technical knowledge of p-values and confidence intervals to a more general concern about communicating uncertainty and variation.

I’ve spent a lot of time in the past thirty years making graphs and thinking about making graphs. I’ve spent a lot of time in the past fifteen years or so thinking about statistical graphics as a central link connecting model building, Bayesian inference, and exploratory data analysis (in particular, check out my articles from 2003 and 2004 that I keep linking to). And I’ve spent a lot of time in the past year thinking and writing about the different goals of different practitioners of statistical graphics.

How is it that Kosara and Yau can feel so strongly that certain displays such as the swirly plot or Wordle are so great, while Unwin, Fung, and I can feel so strongly the opposite (maybe not in these particular cases but in similar examples)? I don’t think the answer is that they’re wrong and I’m right; rather, I think we have different goals.

I’ve tried to explore these differences by studying some graphs that infovis proponents really seem to like: Florence Nightingale’s plots, Yau’s 5 best data visualizations of the year, an award-winning plot from a newspaper contest, and others. I think I’ve found some common features in these graphs which might indicate some systematic differences in goals between infovis and statgraphics. In particular, these popular visualizations that I don’t like so much all seem to be, to some extent, mini-puzzles, visual challenges that suck in the viewer and involve him or her in the challenge of interpreting the image. In some ways it’s the opposite of a classic Cleveland-style statistical graphic. The statistical graphic is supposed to be transparent, while in the prize-winning information visualizations, viewer involvement often begins with the challenge of figuring out what and where the data are.

I’m not saying this to put anyone down! I make no secret of my own tastes but the point here is not to win an argument but rather to explore the differences so we can better work together.

I think it will be easier for us to learn from each other (see the first paragraph in “Putting it all together” above) if we can realize that our goals are different, if we as statisticians recognize that a transparent graph may not be the most visually appealing for non-statisticians, and if graphics design experts recognize that the most visually-appealing image may not be the most effective tool (statically or dynamically) for learning from data.

16 Comments

  1. Maybe you’re missing the difference between exploratory and explanatory/communicatory/persuasatory/expository graphics? Exploratory graphics are aimed at you, the analyst, and you typically create hundreds in the course of an analysis. Most immediately end up in the trash, because they just lead you to the next graphic – so the emphasis is on creating them as quickly as possible. Expository graphics are what you use to present your results. They’re aimed at people who are not intimately familiar with the data, and typically don’t have much time/motivation to dive into it, so the emphasis is on creating a small number of graphics that tell a compelling story.

    Stat graphics tends to be more focused on exploratory graphics; infovis tends to more focussed on expository graphics – but again, that’s just a generalisation. There are infovis researchers working on exploratory graphics, and stat graphics researchers working on expository graphics.

    Finally, making sweeping generalisations about any group is likely to raise hackles, and I think that’s what you’re experiencing. Personally, I’ve never had any problems moving between the infovis and stat graphics communities, and I certainly struggle to ensure that the graphics and tools that I produce incorporate the best from both worlds.

    • Andrew says:

      Hadley:

      I agree about the difference between exploratory and presentation graphics, but there are differences in taste and goals even there. For example, Robert Kosara likes the swirly plot as an exploratory graph for learning about periodicity, whereas I prefer graphing the data by day of the week. And for presentation graphics I tend to prefer graphs that are close to the data and the model (as in the graphs in Red State Blue State) rather than the more processed images that win the design awards.

      As noted above, I think I personally could benefit from some of the expertise of infovis people, and they could benefit from thinking about graphics more statistically. I’m hoping that your ggplot2 will help, by encouraging graph-makers to think more systematically about their choices in terms of data structure.

      • Agreed, but I’m not convinced the differences in approach are discipline based. They may just be RK vs AG differences – you need more data to convince me of your hypothesis.

  2. K? O'Rourke says:

    > “display every data point as transparently as possible”

    I used the term “transparently as possible” once when Davd Cox was in the audience

    And he pointed out that “transparency is in the eye of the beholder”.

    I am still working away on this
    http://andrewgelman.com/2011/05/missed_friday_t/

    which does display every point as a curve and though that it is transparent for some statisticians
    – including the journal editor who just passed on it as “[not] provide[ing] a sufficient contribution/advance to justify publication” while agreeing such plots were lacking in statistical work and the approach intriguing – its hard to make clear to many.

    But I also need to consider Hadley’s point that they are “aimed for the analyst” to explore model deficiencies.

    And I think Andrew has some points to make in these posts which are coming out.

    K?

  3. Tom says:

    You make a comment about ‘not-so-satisfied’ users of excel. Some of my experience is that many people in a corporate environment have excel provided for data analysis and no other tools, and as a result they struggle to visualise data in any meaningful manner, but don’t know any better. This means that a lot of scientists spend time not being able to extract any meaning from their data without specialist help – there is a learning curve associated with R (and other commercial stats/visualisation tools), and many people simply do not have the time to devote to this (or the cultural inertia dictates that using excel will suffice). In these situations, simple meaningful charts that could be easily generated would be massively valuable but as you say – there is often only excel available, and people don’t know any better.

  4. Jerzy says:

    I think you are lumping together visualization researchers (like Kosara or Yau) with graphics designers. I don’t know if my categories line up with what people actually call themselves, but I see three groups discussed here: statisticians who sometimes happen to make graphs (but have little expertise in design), graphic designers who sometimes happen to plot data (but have little expertise in stats), and information visualization researchers who look for interesting ways to combine the two fields.

    I also agree with Hadley that exploratory vs. expository graphs is a useful distinction, although I might not associate stats with one and infovis with the other.

    You seem to acknowledge the benefits of “snappy” graphic design, in grabbing attention for expository graphs, but you don’t talk about the infovis researchers building better tools that analysts can use for exploratory analysis.

    That seems to underlie your Chris Rock / mini-puzzle comments. Kosara’s swirly graph is unnecessarily complicated for an expository plot, but he has actually said he values it more as an interactive exploratory tool. It’s meant to be a “puzzle” for the researcher who doesn’t yet know what the trends in the data are, not for the end reader who just wants the analyst’s summary. (Of course this was not clear at first since he posted an image, not a link to the interactive tool.) Whether or not the swirly graph is a particularly good example, as a statistical analyst I would love more visualization tools that let me explore the data in new ways… and then I can go back to scatterplots and other traditional graphs to summarize the trends I’ve found. If infovis folks are doing this kind of research into new visualization tools, I’m all in favor.

    (I admit I can see cases where the “puzzle solving” effect holds for myself: even if I don’t care about the topic of the data, I might have fun playing with the interactive visualization for its own sake. Just don’t neglect the benefits of getting inspiration from these one-off visualizations and making them into exploratory tools for analysts to use on new problems.)

    Anyhow, anecdotal evidence shows that among statisticians there’s a lot of hunger for advice on graphics: Yau’s talk at the Census Bureau today filled the room to standing capacity and people had to be turned away! It’s clearly a popular topic :)

    • Jerzy says:

      Basically you come across as though you’re reducing all info visualization to “sucking in the viewer”: either “make this plot look snazzy” or the “mini-puzzle” approach.

      But I’m a bit surprised that you ignore the other examples Kosara gave, on p. 5 of that newsletter article:
      http://stat-computing.org/newsletter/issues/scgn-22-1.pdf

      “Scientific visualization made it possible to see the effects of design changes on the pressure distribution of an airplane wing, for example.”
      This is the same thing as statistical graphics for model-checking, just applied to physical models of the real world.

      “A lot of data is already available in principle, but not in a form that normal people would want to play with.”
      For example, the NY Times visualization of unemployment rates by demographic group:
      http://www.nytimes.com/interactive/2009/11/06/business/economy/unemployment-lines.html
      The graph is perfectly ordinary from a statistical graphics point of view.
      But the easy interactivity makes it more useful for the typical non-expert (compared to finding the data, putting it into Excel, and plotting each demo group one after another)… or even for the expert!
      I can make these graphs myself, but I could certainly use a generalized tool of this sort for my data. (Plot a set of numbers by time, and click on a simple menu of factor variables to highlight one particular timeline.) Does it exist already? Maybe, I dunno, they never taught me this stuff in grad school!

      Neither of these examples is meant to “suck in the viewer” but simply to make the data easier to work with.
      If Kosara gives such examples as a core part of the information visualization field, but you ignore them in your attempt to summarize the field’s goals, then it’s no surprise he thinks your view of info visualization is skewed.

      • Andrew says:

        Jerzy:

        I like that NYT visualization. In my discussion, I’m particularly interested in examples that work for infovis people but don’t work for statisticians such as myself. Hence my focus on examples like the swirly plot, Nightingale’s graph, etc. I agree that there are lots of examples where principles of statistical graphics and information visualization work together. My point in these discussions is not to give a thorough view of infovis (I’m happy to let Kosara and others do this, they’re more qualified than I am for the task) but to highlight the differences. Recall that this all started with my puzzlement that Yau’s 5 favorite visualizations didn’t impress me, as statistical graphics. That’s when it hit me that my goals are not the only goals here.

        Finally, “sucking in the viewer” is a good thing! I am serious when I say that Red State Blue State would’ve been a better book if it had graphics that suck in the viewer. Communication is important to me.

        • Jerzy says:

          I fully agree: sucking in viewers is a good thing, and not all infovis examples are great statistical graphics.
          What drove me to comment is that you’re taking “examples that work for infovis people but don’t work for statisticians” but then writing as though they represent all of what infovis is.

          I need a Venn diagram: You’re not really describing the difference in goals between [stat graphics] and [infovis], but rather the difference between [the overlap of stat graphs and infovis] and [the part of infovis that doesn’t overlap with stat graphs]… right?
          (I can’t think of an independent part of stat graphics that doesn’t overlap with infovis.)

  5. Martin Theus says:

    Maybe there is a distinction along a general thinking of how we use visualization tools:

    I often think of a toolbox of statistical graphics tools, which – quite along the lines of my basic math training – should be a canonical set of graphs, which (at least in theory) will span the complete space of possible visualization problems.

    People who grew up in Infovis maybe (and I am only talking from what I learned talking to these people) have less a canonical toolbox view, but maybe try to find *the* perfect visualization for *one* specific dataset. In doing so, they definitely have a much harder time to think about general distribution properties and alike, but usually end up with much more sophisticated graphical displays – in the case they find the “killer graphics”.

    Given the many posts we recently saw on both sides, I am really sort of puzzled that we can’t really close in on the points that are common, and even harder, that are separating us. I am really happy to see, that we agree on finding the best visual representation of data to tell a compelling story as being our main goal. We also agree on the power of doing this with interactive tools, that allow us to find the crucial points far more effectively than a model based approach would do.

    Maybe we are only at the start of a longer conversation between the disciplines, and I am absolutely positive that we will advance on both sides by doing this!

  6. Kim Rees says:

    I think you make the three distinctions on discipline (types a, b, & c), but then spend the rest of the article defending why you make those distinctions (because *some* people in those categories do bad things). Why not just make the distinctions of good versus bad and forget about placing people in groups within the continuum of numbers > design?

    • Andrew says:

      Kim:

      No, I can’t just distinguish good vs. bad. That’s my point: what’s good to me seems bad to others, and vice-versa. Hence my interest in understanding the different goals that different people have.

  7. […] the research side of blogging, the statgraphics-vs-infovis debate continued with Flowing Data and Statistical Modeling, Causal Inference and Social Science looking for common ground while Concrete Nonsense shared some expository comments on positive […]

  8. Di Cook says:

    I’m a bit late to the discussion, but I’d like to chime in………

    Maybe statisticians are simply visually and aesthetically challenged! %^)

    I don’t think of the information visualization community as homogenous. People come from all sorts of backgrounds, expertise and interests to this area of work.

    I believe Hadley when he says says he moves fairly easily between statistics and information. Hadley keeps himself educated in both communities by monitoring many blogs on a regular basis. I’ve learned a lot from Hadley and also Heike Hofmann on these topics, because we jointly teach a course on Information Visualization and forced ourselves to read the broader literature – actually Heike did most of this and I lean heavily on her selection of readings! SO Andrew, I think you need to work harder – you could do this also learn more about visual aesthetics and cognitive perception by reading the literature from other areas. I don’t think that it would damage your statistical graphics qualifications %^).

    Many people in the statistics community have a tendency to build walls, and some of your inclinations seem to be a little leaning this way, too. We are the statistics in-group because we do x, y, z, often the requirements are strictly mathematics qualifications, neglecting the importance of computing and data for statistics. For example, how many statisticians do you know that expect to get the data handed to them on a plate in a convenient “csv” form. We, as statisticians, are not expected to get our hands dirty with that messy data cleaning! How many statisticians do you know who can use an html scraper to pull data off web pages? I can probably count them on one hand. This is a very useful skill today, given that a large amount of data is freely available in html form. But it gets much worse than this. It is sometime perceived that a person who can actually do this is not a statistician at all. Like statistical graphics researchers are not statisticians! So the statistics community tends to be exclusive rather than inclusive in my experience, to the detriment of the field. Rather than get out and learn more we wash our hands by saying it’s not our responsibility.

    And, I am also guilty of this. I’ve been re-educated with my Statistics PhD and jumping hurdles for promotion from being an art and computer graphics lover to a dry statistical graphics producer. Way back in the 80s I was using new graphics terminals to generate beautiful color contour plots of likelihoods. The statisticians that I worked with preferred letter plots for this so that they could be sure of the accuracy of the boundaries. And I have to admit that when I first saw Cleveland talking about linked brushing I thought the plots were so boringly monochrome! Now it is the dry plots that I am more comfortable with! And I see the usefulness of keeping things objective and simple and accurate. But I’m lucky now to be returning to my roots. I like chart junk sometimes now, and have a new appreciation of aesthetics from watching the work of people like Wattenberg, and Fry and Viegas. It is easier to do these things now. The amount of work that Hadley put into the aesthetics and defaults of ggplot2 is impressive.

    Finally, I do think that this sort of discussion is a worthwhile exercise. I remember Andreas Buja planning a talk for a statistical graphics workshop in Europe, a long time ago, and he decided to discuss a taxonomy of interactive visual methods. He had tasks broken down into the stamp collector (small multiples, grouping/rearranging), the field biologist (linking between plots, looking up the characteristics of plant specimen in the field guide), and the photographer (focus and zooming, zoom and pan, resizing binwidths). It was so out of the blue to see him thinking like this! And it helped me to catalog the different types of interactive graphics actions that I watched people do. In the end though any taxonomy is imperfect, so things do not naturally fit into one box or another. I still have mental anguish over where some tasks go, each time I think about them. For example is a tour part of the field biologist’s group where multiple projections are linked by time, or is it part of the photographer’s group where the focus is changed across projections. A taxonomy like a model is always wrong, but sometimes convenient!

    • Andrew says:

      Di:

      I’m certainly not trying to build walls. I’m trying to open discussion! I very much appreciate the comments from you, Hadley, and Martin; maybe this will help get things rolling and some of the hardcore infovis types will chime in too.

      I’ll follow up soon (in a few days?) with a post on where I’m coming from in all this graphics discussion. Maybe this will help.