To continue our discussion from last week, consider three positions regarding the display of information:
(a) The traditional tabular approach. This is how most statisticians, econometricians, political scientists, sociologists, etc., seem to operate. They understand the appeal of a pretty graph, and they’re willing to plot some data as part of an exploratory data analysis, but they see their serious research as leading to numerical estimates, p-values, tables of numbers. These people might use a graph to illustrate their points but they don’t see them as necessary in their research.
(b) Statistical graphics as performed by Howard Wainer, Bill Cleveland, Dianne Cook, etc. They–we–see graphics as central to the process of statistical modeling and data analysis and are interested in graphs (static and dynamic) that display every data point as transparently as possible.
(c) Information visualization or infographics, as performed by graphics designers and statisticians who are particularly interested in the way that graphics can involve non-statisticians in the process of thinking about and understanding numerical data.
I place myself in group (b) but I recognize that a lot of smart and accomplished researchers find themselves in (a) or (c). From about 1995-2010, most of my writing on graphics focused on the contrast between (a) and (b), in particular the way that exploratory graphics could be used to check the fit of and better understand probability models.
In the past year or so, though, I’ve been writing a lot about the distinction between (b) and (c). It’s been a struggle, as many of the infographics people seem to feel that I’m putting them down. But I’m not! Despite what Nathan Yau says, I don’t think it’s a putdown of Florence Nightingale’s graph to compare it to a Chris Rock comedy routine. I think Chris Rock is great.
OK. I’ll try one more time, as I haven’t done such a great job communicating this all so far.
I’ll start with an example where these differences in perspective can make a big difference.
Why does this matter?
The differences between statistical graphics and information visualization are important because, on both sides, people are doing work that’s not as good as it could be.
I’m guilty of this too! As can be seen from our book Red State Blue State. My coauthors and I spent a huge amount of effort figuring out how to display the data and our inferences graphically. And I think we did a good job. But . . . we really could’ve benefited from collaboration with an infovis expert. Our graphs were solid but not visually innovative. They were not exciting. They were fine for the readers who already cared about the data but not enough to grab the casual reader.
And this was a failing on our part! We wrote the book because we felt the topic was important and we wanted to engage journalists and the general public. To have all these great analyses and to display them so boringly . . . that’s a wasted opportunity. It would be like printing the entire book in an ugly font: sure, you could read it if you really needed to, but it puts an unnecessary burden on the potential reader.
In my ignorance of the different perspectives of infographics and statistical visualization, I naively thought that the best images for the statistical purpose of my understanding of the data would automatically be the best for involving others. But I’m pretty sure I was wrong. We had well over 100 graphs in our book and we surely could’ve spared some space for some attractive and inviting infographics.
From the other direction, I’ve often enough seen snappy infographics that can mislead because of underlying statistical problems. The usual issue is that a non-statistical infographic can garble the data enough that it takes a fair amount of reader effort to decipher the most basic patterns. Thus, a grabby infographic can sometimes convey the complexity of a dataset and draw the reader in further, but at some point it can make sense to make the handoff to a statistical graphic that allows the reader to make comparisons of interest more directly (as discussed by Cleveland in his 1985 book). Again, I think a lost opportunity arises because of people not realizing that the best tool for one purpose can fall short with respect to other goals.
Aiming for common ground
I’ve noticed that different sorts of graphics-makers have different sorts of preferences:
1. Statistical graphics people detest pie charts and tend to like plain-looking displays such as dotplots and lineplots, with even more complicated varieties such as mosaic plots looking fairly uniform from a visual perspective. Statisticians seem to care a lot about displaying data optimally but not much about what people actually learn from their graphs in real life.
2. In contrast, infovis experts tend to like striking, unusual patterns and favor graphs with visual appeal. More than once I’ve seen infovis people express a soft spot for pie charts–I’m never sure if it’s because they think pie charts are an effective way to display data or if they’re just reacting to what they perceive as an annoying prescriptivism by statisticians such as Bill Cleveland and graphics gurus such as Ed Tufte.
Differences in taste
A point where I think we can all agree is that there are some systematic differences in taste.
On one hand you have people like Cleveland, Tufte, Antony Unwin, Kaiser Fung, myself, and many other statisticians who want graphs to be transparent so that users can identify what each data point on the graph stands for.
On the other hand you have people like Robert Kosara, Nathan Yau, and the many thousands (hundreds of thousands?) of satisfied users of Wordle who want data images that can tell an effective story and find communication to be much more compelling than any abstract principles about data-ink.
And somewhere on the side are the tens of millions of maybe not-so-satisfied users of Excel who are fumbling to display today’s data using last century’s tools, making graphs that hit that sweet spot of being both ugly and barely informative! (But not always; sometimes I like a plain simple barplot.)
Points in common
Infovis and statgraphics people have a lot in common, and maybe I should spend more time emphasizing that. I’ve recently been talking a bit about the two dimensions of attractiveness and informativeness and how we stand at different points on the efficient frontier of this space, but in practice one can often put in a bit of effort and improve in both directions. In other cases a reexpression can reveal different dimensions in a dataset. For example, consider our discussion of these time-use clock graphs. My commenters and I suggested various alternatives; for some purposes the original graphs might be fine; the key point, though, which we should all be able to agree on, is that there are a lot of things to look for here and we shouldn’t be stuck in any single form of visual data expression.
Different tastes, different goals
The main point I’ve been trying to get at in the recent discussion, starting with my paper with Unwin, is that it makes sense to consider the different tastes of statgraphics and infovis people as reflecting different goals.
It’s not just that we statisticians are too lame to make graphs that look good, or that graphics designers are too clueless to display actual data. Rather, we develop different expertises partly to serve our different goals. As a statistician, I’m particularly interested in assessing the fit of my models to my data, and so I work on graphs that allow comparisons of individual data points and give me a sense of my fitted models. In contrast, graphics designers are always being made aware of the problems of getting the attention of outsiders, hence they develop tools to make graphs more impressive visually appealing.
Putting it all together
Infovis people have a lot of knowledge that statisticians don’t have, ranging from technical issues of fonts and colors to a general user-focused visual and storytelling perspective. And statisticians can offer a lot from the other direction, ranging from technical knowledge of p-values and confidence intervals to a more general concern about communicating uncertainty and variation.
I’ve spent a lot of time in the past thirty years making graphs and thinking about making graphs. I’ve spent a lot of time in the past fifteen years or so thinking about statistical graphics as a central link connecting model building, Bayesian inference, and exploratory data analysis (in particular, check out my articles from 2003 and 2004 that I keep linking to). And I’ve spent a lot of time in the past year thinking and writing about the different goals of different practitioners of statistical graphics.
How is it that Kosara and Yau can feel so strongly that certain displays such as the swirly plot or Wordle are so great, while Unwin, Fung, and I can feel so strongly the opposite (maybe not in these particular cases but in similar examples)? I don’t think the answer is that they’re wrong and I’m right; rather, I think we have different goals.
I’ve tried to explore these differences by studying some graphs that infovis proponents really seem to like: Florence Nightingale’s plots, Yau’s 5 best data visualizations of the year, an award-winning plot from a newspaper contest, and others. I think I’ve found some common features in these graphs which might indicate some systematic differences in goals between infovis and statgraphics. In particular, these popular visualizations that I don’t like so much all seem to be, to some extent, mini-puzzles, visual challenges that suck in the viewer and involve him or her in the challenge of interpreting the image. In some ways it’s the opposite of a classic Cleveland-style statistical graphic. The statistical graphic is supposed to be transparent, while in the prize-winning information visualizations, viewer involvement often begins with the challenge of figuring out what and where the data are.
I’m not saying this to put anyone down! I make no secret of my own tastes but the point here is not to win an argument but rather to explore the differences so we can better work together.
I think it will be easier for us to learn from each other (see the first paragraph in “Putting it all together” above) if we can realize that our goals are different, if we as statisticians recognize that a transparent graph may not be the most visually appealing for non-statisticians, and if graphics design experts recognize that the most visually-appealing image may not be the most effective tool (statically or dynamically) for learning from data.