A new course on statistical graphics

I’m planning to teach a new course on statistical graphics next spring.

Background:

Graphical methods play several key roles in statistics:

– “Exploratory data analysis”: finding patterns in raw data. This can be a challenge, especially with complex data sets.
– Understanding and making sense of a set of statistical analyses (that is, finding patterns in “cooked data”)
– Clear presentation of results to others (and oneself!)

Compared to other areas of statistics, graphical methods require new ways of thinking and also new tools.

The borders of “statistical graphics” are not precisely defined. Neighboring fields include statistical computing, statistical communication, and multivariate analysis. Neighboring fields outside statistics include computer programming and graphics, visual perception, data mining, and graphical presentation.

Structure of the course:

Class meetings will include demonstrations, discussions of readings, and lectures. Depending on their individual interests, different students will have to master different in-depth topics. All students will learn to make clear and informative graphs for data exploration, substantive research, and presentation to self and others.

Students will work in pairs on final projects. A final project can be a new graphical analysis of a research topic of interest, an innovative graphical presentation of important data or data summaries, an experiment investigating the effectiveness of some graphical method, or a computer program implementing a useful graphical method. Each final project should take the form of a publishable article.

The primary textbook will be R Graphics, by Paul Murrell (to be published Summer, 2005).

See below for more information on the course; also see here for a related course by Bill Cleveland (inventor of lowess, among other things). Any further suggestions would be appreciated.

Tentative list of topics and readings:

(There’s more in the readings than any one student would read; I imagine that many would simply serve as references, with particularly relevant excerpts photocopied.)

1. General practice:

1a. What do good (and bad) graphs look like?

The Visual Display of Quantitative Information and Envisioning Information (E. Tufte).

Visual Revelations (H. Wainer).

“How to display data badly” (H. Wainer). The American Statistician 38, 137-147 (1984).

1b. How to make useful and good-looking graphs. Making graphs in the open-source statistical package R.

R Graphics (P. Murrell).

An R and S-Plus Companion to Applied Regression (J. Fox).

Webpage of resources on statistical graphics (M. Friendly). http://www.math.yorku.ca/SCS/StatResource.html#DataVis

1c. What should be graphed?

The Elements of Graphing Data and Visualizing Data (W. Cleveland).

2. Graphics as a tool in scientific inference:

2a. Graphics as a primary data-analysis tool.

“Why are American Presidential election campaigns so variable when votes are so predictable?” (A. Gelman and G. King). British Journal of Political Science 23, 409-451.

2b. Graphics that enhance the understanding of a fitted model.

Generalized Additive Models (T. Hastie and R. Tibshirani).

“Enhancing democracy through legislative redistricting” (A. Gelman and G. King). American Political Science Review 88, 541-559.

“The B-K plot: making Simpson’s paradox clear to the masses” (H. Wainer). Chance 15 (3), 60-62.

2c. More elaborate graphical methods (multidimensional scaling, other multivariate models, dynamic graphics, interactive graphics, . . .)

“Multidimensional scaling, tree-fitting, and clustering” (R. Shepard). Science 210, 390-398.

“Computational methods for high-dimensional rotations in data visualization” (A. Buja, D. Cook, D. Asimov, and C. Hurley. In Handbook of Statistics, ed. E. Wegman and C. Rao (2005).

3. Statistical theories of graphics:

3a. Graphs as model-free exploratory data analysis.

Exploratory Data Analysis (J. Tukey).

3b. Graphics and models.

“All maps of parameter estimates are misleading” (A. Gelman and P. Price). Statistics in Medicine 18, 3221-3234 (1999).

“Exploratory data analysis for complex models” (A. Gelman, with discussion by A. Buja). Journal of Computational and Graphical Statistics (2004).

3c. Graphical methods as algorithms.

The Grammar of Graphics (L. Wilkinson).

4. Psychology:

4a. Cognitive models of how people read and understand numbers and graphs.

“Spatial Schemas in Depictions” (B. Tversky). In Spatial Schemas and Abstract Thought, ed. M. Gattis, ed (2001).

“A theory of graph comprehension” (S. Pinker). In Artificial intelligence and the future of testing, ed. R. Freedle (1990).

4b. Experimental studies of effectiveness of different graphical displays.

“Understanding charts and graphs” (S. Kosslyn). Applied Cognitive Psychology 3, 185-223 (1989).

“Conceptual limitations in comprehending line graphs” (P. Shah and P. Carpenter). Journal of Experimental Psychology: General 124, 43-61 (1995).

4c. Some examples of graphs as actually used in psychological research.

The Statistical Sleuth, second edition (F. Ramsay and D. Schafer).

5. Communication (to self and to others):

“Let’s practice what we preach: turning tables into graphs” (A. Gelman, C. Pasarica, and R. Dodhia). American Statistician 56, 121-130 (2002).

“The science of scientific writing” (G. Gopen and J. Swan). American Scientist 78, 550-558 (1990).

How to Lie with Statistics (D. Huff).

(I need a more updated reference here also—-something along the lines of “Statistics for Journalists”)

1. Stephen Weigand says:

I have two very brief comments that might be relevant. First, getting custom graphical displays exactly the way one wants takes a lot of time, even with a package such as R that offers many plotting methods and fine control of graphics. Second, in the biomedical literature there seems to be a preference for simple "thin" displays that don't require much effort to interpret and that more complex and information-rich displays are often criticized by reviewers, editors, or (dare I say), coauthors. Still, I think the "right" display, however complex, is worth spending time on and defending.

2. Andrew says:

Stephen,

Point taken. I'm getting better at R graphing but it still takes me at least 15 minutes to make a graph look just right.

Regarding complex displays, my current preference is to use grids of relatively simple plots (the "small multiples" idea of Tufte, 1990).

My other current favorite is to plot fitted model with data overlain. We do lots of residual plots, but more and more I'm liking the most basic plots that show the curve of y=a+bx, etc.

3. Jake says:

I like the structure of your course (oriented by the different purposes of statistical graphics), but I wonder where you might put discussions of how graphics help (or hinder) understanding of certain common techniques, like partial regression plots (which make explicit the residualization/"controlling for" in the context of regression) or plots for non-continuous data (?mosaic plots? lattice plot ideas?) or other plots that are particularly tailored to common tools.

I suspect that, since so many folks are using OLS and glms, thinking about specific ways that graphics can help us understand the inputs and outputs of such models might be particularly useful.

4. MDM says:

Andrew,

Take a look at my paper, Visualizing Homicide, on my website, and at the accompanying graphs. The homicide data can be found on the ICPSR website, and I've used them successfully in a similar course tailored to social science students who think that all they need do is find something below 0.5 and they're done. If you're interested, I can send you my syllabus. Since I'm now emeritus, I haven't updated it, but they had a lot of fun learning how to do EDA. The figures show 1-on-1 homicides, and I left it to a later time how to deal with n-on-m homicides.

Mike Maltz

5. Jeronimo says:

Andrew,

Another book that applies Cleveland's philosophy is “Data Analysis and Graphics using R” by John Maindonald and John Braun:

http://wwwmaths.anu.edu.au/~johnm/r-book.html

In addition, there’s an online version of the book a little bit different from what you might found on the book but nonetheless helpful:

http://wwwmaths.anu.edu.au/~johnm/r/

You might be interested.

Jeronimo

6. tk says:

Can anyone give me a laymans explanation of what the benefits of dual scaling are in terms of qualitative analysis in comparison to parametric stat analysis