Skip to content
 

Why tables are really much better than graphs

Graphs are gimmicks, substituting fancy displays for careful analysis and rigorous reasoning. It’s basically a tradeoff: the snazzier your display, the more you can get away with a crappy underlying analysis. Conversely, a good analysis doesn’t need a fancy graph to sell itself. The best quantitative research has an underlying clarity and a substantive importance whose results are best presented in a sober, serious tabular display. And the best quantitative researchers trust their peers enough to present their estimates and standard errors directly, with no tricks, for all to see and evaluate. Let’s leave the dot plots, pie charts, moving zip charts, and all the rest to the folks in the marketing department and the art directors of Newsweek and USA Today. Here at this blog we’re doing actual research and we want to see, and present, the hard numbers.

To get a sense of what’s at stake here, consider two sorts of analyses. At one extreme are controlled experiments with clean estimate and p-value, and a well-specified regressions with robust standard errors, where the p-values really mean something. At the other extreme are descriptive data summaries–often augmented with models such as multilevel regressions chock full of probability distributions that aren’t actually justified by any randomization, either in treatment assignment or data collection–with displays of all sorts of cross-classified model estimates. The problem with this latter analysis is not really the modeling–if you state your assumptions carefully, models are fine–but the display of all sorts of numbers and comparisons that no way are statistically significant.

For example, suppose a research article with a graph showing three lines with different slopes. It’s natural for the reader to assume, if such a graph is featured prominently in the article, that the three slopes are statistically significantly different from each other. But what if no p-value is given? Worse, what there are no point estimates are no standard errors to be found? Let alone the sort of multiple comparisons correction that might be needed, considering all the graphs that might have been displayed? Now, I’m not implying any scientific misconduct here–and, to keep personalities out of this, I’ve refrained from linking to the article that I’m thinking about here–but it’s sloppy at best and statistical malpractice at worst to foreground a comparison that has been presented with no rigorous–or even approximately rigorous–measure of uncertainty. And, no, it’s not an excuse that the researchers actually “believe” their claim. Sincerity is no defense, There’s a reason our forefathers developed p-values and all the rest, and let’s remember those reasons.

The positive case for tables

So far I’ve explained my aversion to graphs as an adornment to, or really a substitute for, scientific research. I’ve been bothered for a while by the trend of graphical displays in journal articles, but only in writing this piece right here have I realized the real problem, which is not so much that graphs are imprecise, or hard to read, or even that they encourage us to evaluate research by its “production values” (as embodied in fancy models in graphs) rather than its fundamental contributions, but rather that graphs are inherently a way of implying results that are often not statistically significant. (And all but the simplest graphs allow so many different visual comparisons, that even if certain main effects actually do past the p-value test, many many more inevitably won’t. Some techniques have been developed to display multiple-comparisons-corrected uncertainty bounds, but these are rarely included in graphs for the understandable reason that they magnify visual clutter.)

But enough about graphs. Now I’d like to talk a bit about why tables are not merely a necessary evil but are actually a positive good.

A table lays down your results, unadorned, for the readers–and, most importantly, scientific peers–to judge. Good tables often have lots of numbers. That’s fine–different readers may be interested in different things. A table is not meant to be read as a narrative, so don’t obsess about clarity. It’s much more important to put in the exact numbers, as these represent the most important summary of your results, estimated local average treatment effects and all the rest.

It’s also helpful in a table to have a minimum of four significant digits. A good choice is often to use the default provided by whatever software you have used to fit the model. Software designers have chosen their defaults for a good reason, and I’d go with that. Unncessary rounding is risky; who knows what information might be lost in the foolish pursuit of a “clean”-looking table?

There is also the question of what words should be used for the rows and columns of the table. In tables of regressions, most of the rows represent independent variables. Here, I recommend using the variable names provided by the computer program, which are typically abbreviations in all caps. Using these abbreviations gets the reader a little closer to your actual analysis and also has the benefit that, if he or she wants to replicate your study with the same dataset, it will be clearer how to do it. In addition, using these raw variable names makes it more clear that you didn’t do anything shifty such as transforming or combining your variables before putting them in your regression.

We’d do well to take a lead from our most prominent social science colleagues–the economists–who have, by and large, held the line on graphics and have insisted on tabular presentations of results in their journals. One advantage of these norms is that, when you read an econ paper, you can find the numbers that you want; the authors of these articles are laying it on the line and giving you their betas. Beyond this, the standardization is a benefit in itself: a patterned way of presenting results allows the expert readers–who, after all, represent the most important audience for journal articles–to find and evaluate the key results in an article without having to figure out new sorts of displays. Less form, more content: that’s what tables are all about. If you’ve found something great and you want to share it with the world, sure, make a pretty graph and put it on a blog. But please, please, keep these abominations out of our scientific journals.

B-b-b-but . . .

Yes, you might reply, sure, graphics are manipulative tricks and tables are the best. But doesn’t the ambitious researcher need to make graphs, just to keep up with everybody else, just to get his or her research noticed? It’s the peacock’s tail all over again–I don’t want to waste my precious time making fancy 3-D color bar charts, but if I don’t, my work will get lost in the nation’s collective in-box.

To this I say, No! Stand firm! Don’t bend your principles for short-term gain. We’re all in this together and we all have to be strong, to resist the transformation of serious social science into a set of statistical bells and whistles. Everything up to and including ordered logistic regression is OK, and it’s fine–nay, mandatory–to use heteroscedasticity-consistent standard errors. But No to inapporpriate models and No to graphical displays that imply research findings where none necessarily exist.

Beyond this, even in the short term I think there are some gains from going graph-free.
And, finally, the time you save not agonizing over details of graphs can be instead be used to think more seriously about your research. Undoubtedly there’s a time substitution: effort spent tinkering with graphs (or, for that matter, blogging) is effort not spent collecting data, working out models, proving theorems, and all the rest. If you must make a graph, try only to graph unadorned raw data, so that you’re not implying you have anything you don’t. And I recommend using Excel, which has some really nice defaults as well as options such as those 3-D colored bar charts. If you’re gonna have a graph, you might as well make it pretty. I recommend a separate color for each bar–and if you want to throw in a line as well, use a separate y-axis on the right side of the graph.

Damn. It felt good to get that off my chest. I hope nobody reads this, though, or I might be able to fool people with graphs much longer.

19 Comments

  1. JV says:

    April 1st makes the Internet even more enjoyable.

  2. goodnessOfFit says:

    :)

  3. Jonathan Falk says:

    This post requires well over a thousand words… can you supply the picture that is its equal? Happy 1st to you as well, and I'm still working over the frequentist screed of last year….

  4. bill r says:

    I'm glad you came clean. We need precision and p-values. No more of this prior/posterior/preprior silliness. Let the data speak for themselves!

  5. Andrew Gelman says:

    As I hope was clear, in many ways my discussion above is serious.

  6. Martyn says:

    You make a convincing case, but I think you overlook the virtues of 3-D exploded pie charts, which remain sadly under-used.

  7. john says:

    Unfortunately I was eating a piece of most definitely 3-D peach pie when I read Martyn's comment… and now it's exploded…

  8. Keith says:

    Can we expect a response in the form of a journal article, like last year?

    I hope so!

  9. FosterBoondoggle says:

    "Here, I recommend using the variable names provided by the computer program, which are typically abbreviations in all caps."

    If this line doesn't give it away then my irony detector is broken.

  10. David Smith says:

    I'm not sure I understand your point. Is there a way you could express it in a more digestible form? Something pictorial, perhaps? :)

  11. Adam Kramer says:

    You had me until, "And I recommend using Excel."

  12. Alex F says:

    I couldn't disagree with this more strenuously:

    >>But No to inappropriate models

    This is the great advantage of a table! Instead of bulky graphs that require large graphics, with a table there's almost no cost (in space) of reporting results for additional models. You should run every model you can think of — more importantly, every model a critic could think of — and you don't have to discuss the results in your paper at all. Just add a column to the table, and in the caption you can explain that "Column 37 covers a model with individual fixed effects, a probit choice function, standard errors clustered at the county level, and age-gender-timeofday interactions".

  13. AnonymousCoward says:

    I must say these April 1 posts where you mix good and bad arguments really force the reader to critically examine every statement. It is a unique style I haven't seen elsewhere.

  14. Keith O'Rourke says:

    This may even be more serious than you have claimed – without some necessary tables and the (naive) fear of Benford's law – most of the scientific literature may soon be flooded with pretty graphs boldly and convincingly displaying not just non-significant findings but totally non-existent findings!

    The publication of non-existing finds should be reserved for the more quantitatively sophisticated researchers.

    Keith

  15. thom says:

    I was looking forward to your April 1st posting this year and wasn't disappointed. I think I picked up some of the 'serious' points too. Before I defend tables – I'll note that I'm a recent convert to graphs and I generally argue for fewer plots and more graphs in my field.

    In favour of tables:

    – they support secondary analyses better than graphs
    (I recently spent a long time with a ruler and a spreadsheet reading values from an old paper to produce a graph for a book chapter; last year a colleague and I reanalyzed some data from the 1980s that we had to reconstruct from a graph. In both cases tables would have been much easier! (Last week I wanted to plot some data from an ex-student of mine and was able to reproduce (and improve) her graph from tabulated data). The internet allows us to archive the raw data, but that still isn't that common – tables are a little more versatile and robust than garphs in supporting secondary analysis).

    – they require less skill to produce and less skill to interpret
    (a good table requires less skill than a good plot; both require some skill to get right. Finding the right plot is a hard problem in many cases – and there may not even be a plot that will work for most of audience – aesthetics being more important for plots than for tables)

    – the production overhead is greater for graphs than tables
    (this is related to skill, but even if I know exactly how to plot my data getting it looking right usually takes a lot more effort than the table.)

    These are my main gripes about graphs. I think the balance is changing a bit as I get to grips with R (because it is reducing the cost of producing graphs and of getting them 'just right').

  16. Antonio says:

    stata + excel + tables + interpreting several regressions coefficients (with enough stars) as several "causes" = happy social scientists :)

  17. ZBicyclist says:

    We don't need just tables, we need tables with enough decimal points — I'd say generally 7 +/- 2 decimal points on all tabled numbers.

    Any numbers worth reporting are worth over-reporting.

  18. Alex says:

    The approach I generally find most useful when reading papers is:

    – Tables with vital information that supports some further analysis (without excessive decimal places and, preferably, scaled to a sane order of magnitude and possibly standardized)

    – Graphs that display key findings in a clear, comprehensible way

    – Key results (and, if possible, data) posted online in a machine-readable format

    The last point is very important; the graphs give you a very good idea of the findings, but the online component gives you the replicability.

  19. Jamie Olson says:

    I can see what you're saying, but I think that ultimately, it's possible to communicate a lot more information in a graphic than you could reasonably expect someone to get from a table.