## My talk on statistical graphics at Mit this Thurs aft

Speaker: Andrew Gelman, Columbia University
Date: Thursday, November 29 2012
Time: 4:00PM to 5:00PM

Location: 32-D463 (Star Conference Room)
Host: Polina Golland, CSAIL
Contact: Polina Golland, 6172538005, polina@csail.mit.edu

The importance of graphical displays in statistical practice has been recognized sporadically in the statistical literature over the past century, with wider awareness following Tukey’s Exploratory Data Analysis (1977) and Tufte’s books in the succeeding decades. But statistical graphics still occupies an awkward in-between position: Within statistics, exploratory and graphical methods represent a minor subfield and are not well-integrated with larger themes of modeling and inference. Outside of statistics, infographics (also called information visualization or Infovis) is huge, but their purveyors and enthusiasts appear largely to be uninterested in statistical principles.

We present a set of goals for graphical displays discussed primarily from the statistical point of view and discuss some inherent contradictions in these goals that may be impeding communication between the fields of statistics and Infovis. One of our constructive suggestions, to Infovis practitioners and statisticians alike, is to try not to cram into a single graph what can be better displayed in two or more.

We recognize that we offer only one perspective and intend this work to be a starting point for a wide-ranging discussion among graphics designers, statisticians, and users of statistical methods. Our purpose is not to criticize but to explore the different goals that lead researchers in different fields to value different aspects of data visualization.

P.S. Following my recent thoughts, I wish I’d called it Tradeoffs in information graphics.

1. Jorge Camoes says:

Regarding the use of multiple charts instead of one, I couldn’t disagree more. As you know, charts, like sheets in a workbook, are very, very expensive. Don’t use more than a single chart and a single sheet. Use the sheet to store the data, parameters, comments, etc.

What can you do with all your saved money? Why, buy colors, of course! A full rainbow, if possible. Color is always handy, even if you don’t need it.

2. kjetil b halvorsen says:

¿Where did you get the data on health expenditures from?
The data from
http://data.worldbank.org/indicator/SH.XPD.PCAP

is different, especially the data point on Norway which is some thousand dollars off …

• Andrew says:

Kjetil:

I don’t remember. I think it was the same source as used in the original chart that we remade.

3. Robert Grant says:

Andrew,
Sounds good, I hope it goes well. Any thoughts on sonic representations of data? I’ve stumbled across a few online and there is one academic group in Germany who work on it but it seems to be about the stage of development where graphs were with Playfair’s mortality graphs circa 1770. Nobody knows what might make for a good one.

5. bcnc says:

I liked the reduction of Sharad Goel’s data to a parallel coordinate plot. I have similar data but instead of correlations they are means with confidence intervals. What do you think if I were to add CI’s to the parallel coordinate plot? Would this be a case of multiple plots (i.e. the lattice structure) being better than one?