Skip to content

Try a spaghetti plot


Joe Simmons writes:

I asked MTurk NFL fans to consider an NFL game in which the favorite was expected to beat the underdog by 7 points in a full-length game. I elicited their beliefs about sample size in a few different ways (materials .pdfdata .xls).

Some were asked to give the probability that the better team would be winning, losing, or tied after 1, 2, 3, and 4 quarters. If you look at the average win probabilities, their judgments look smart.

But this graph is super misleading, because the fact that the average prediction is wise masks the fact that the average person is not. Of the 204 participants sampled, only 26% assigned the favorite a higher probability to win at 4 quarters than at 3 quarters than at 2 quarters than at 1 quarter. About 42% erroneously said, at least once, that the favorite’s chances of winning would be greater for a shorter game than for a longer game.

How good people are at this depends on how you ask the question, but no matter how you ask it they are not very good.

The explicit warning, “This Graph is Super Misleading,” is a great idea.

But don’t stop there! You can do better. The next step is to follow it up with a spaghetti plot showing people’s estimates.  If you click through the links, you see there are about 200 respondents, and 200 is a lot to show in a spaghetti plot, but you could handle this by breaking up the people into a bunch of categories (for example, based on age, sex, and football knowledge) thus allowing a grid of smaller graphs, each of which wouldn’t have too many lines.

P.S. Jeff Leek points out that sometimes a spaghetti plot won’t work so well because there are too many lines to plot and all you get is a mess (sort of like the above plate-o-spag image, in fact). He suggests the so-called lasagna plot, which is a sort of heat map, and which seems to have some similarities to Solomon Hsiang’s “watercolor” uncertainty display.

A heat map could be a good idea but let me also remind everyone that there are some solutions to overplotting of the lines in a spaghetti plot, some ways to keep the spaghetti structure while losing some of the messiness. Here are some strategies, in increasing order of complexity:

1. Simply plot narrower lines. Graphics devices have improved, and thin lines can work well.

2. Just plot a random sample of the lines. If you have 100 patients in your study, just plot 20 lines, say.

3. Small multiples: for example, a 2×4 grid broken down by male/female and 4 age categories. Within each sub-plot you don’t have so many lines so less of a problem with overplotting.

4. Alpha-blending.

Three ways to present a probability forecast, and I only like one of them

To the nearest 10%:


To the nearest 1%:


To the nearest 0.1%:

I think the National Weather Service knows what they’re doing on this one.

On deck this week

Mon: Three ways to present a probability forecast, and I only like one of them

Tues: Try a spaghetti plot

Wed: I ain’t got no watch and you keep asking me what time it is

Thurs: Some questions from our Ph.D. statistics qualifying exam

Fri: Solution to the helicopter design problem

Sat: Solution to the problem on the distribution of p-values

Sun: Solution to the sample-allocation problem

“Your Paper Makes SSRN Top Ten List”

I received the following email from the Social Science Research Network, which is a (legitimate) preprint server for research papers:

Dear Andrew Gelman:

Your paper, “WHY HIGH-ORDER POLYNOMIALS SHOULD NOT BE USED IN REGRESSION DISCONTINUITY DESIGNS”, was recently listed on SSRN’s Top Ten download list for: PSN: Econometrics, Polimetrics, & Statistics (Topic) and Political Methods: Quantitative Methods eJournal.

As of 02 September 2014, your paper has been downloaded 17 times. You may view the abstract and download statistics at:

Top Ten Lists are updated on a daily basis. . . .

The paper (with Guido Imbens) is here.

What amused me, though, was how low the number was. 17 downloads isn’t so many. I guess it doesn’t take much to be in the top 10!

Hoe noem je?

Haynes Goddard writes:

Reviewing my notes and books on categorical data analysis, the term “nominal” is widely employed to refer to variables without any natural ordering. I was a language major in UG school and knew that the etymology of nominal is the Latin word nomen (from the Online Etymological Dictionary: early 15c., “pertaining to nouns,” from Latin nominalis “pertaining to a name or names,” from nomen (genitive nominis) “name,” cognate with Old English nama (see name (n.)). Meaning “of the nature of names” (in distinction to things) is from 1610s. Meaning “being so in name only” first recorded 1620s.)

So variables without a natural order such as gender (male-female), transport mode (walk, bicycle, bus, train, car) and so on are just coded 0, 1 and so on. Yet the textbook writers do not explain that nominal just means name which it seems to me would help the students better understand the application.

Do you know when this usage was first introduced into statistics?

I have no idea but maybe you, the readers, can offer some insight?

How do companies use Bayesian methods?

Jason May writes:

I’m in Northwestern’s Predictive Analytics grad program. I’m working on a project providing Case Studies of how companies use certain analytic processes and want to use Bayesian Analysis as my focus.

The problem: I can find tons of work on how one might apply Bayesian Statistics to different industries but very little on how companies actually do so except as blurbs in larger pieces.

I was wondering if you might have ideas of where to look for cases of real life companies using Bayesian principles as an overall strategy.

Some examples that come to mind are pharmaceutical companies that use hierarchical pharmacokinetic/pharmacodynamic modeling, as well as people on the Stan users list who are using Bayes in various business settings. And I know that some companies do formal decision analysis which I think is typically done in a Bayesian framework. And I’ve given some short courses at companies, which implies that they’re interested in Bayesian methods, though I don’t really know if they ended up following my particular recommendations.

Perhaps readers can to supply other examples?

Prediction Market Project for the Reproducibility of Psychological Science

Anna Dreber Almenberg writes:

The second prediction market project for the reproducibility project will soon be up and running – please participate!

There will be around 25 prediction markets, each representing a particular study that is currently being replicated. Each study (and thus market) can be summarized by a key hypothesis that is being tested, which you will get to bet on.

In each market that you participate, you will bet on a binary outcome: whether the effect in the replication study is in the same direction as the original study, and is statistically significant with a p-value smaller than 0.05.

Everybody is eligible to participate in the prediction markets: it is open to all members of the Open Science Collaboration discussion group – you do not need to be part of a replication for the Reproducibility Project. However, you cannot bet on your own replications.

Each study/market will have a prospectus with all available information so that you can make informed decisions.

The prediction markets are subsidized. All participants will get about $50 on their prediction account to trade with. How much money you make depends on how you bet on different hypotheses (on average participants will earn about $50 on a Mastercard (or the equivalent) gift card that can be used anywhere Mastercard is used).

The prediction markets will open on October 21, 2014 and close on November 4.

If you are willing to participate in the prediction markets, please send an email to Siri Isaksson by October 19 and we will set up an account for you. Before we open up the prediction markets, we will send you a short survey.

The prediction markets are run in collaboration with Consensus Point.

If you have any questions, please do not hesitate to email Siri Isaksson.

Statistical Communication and Graphics Manifesto

Statistical communication includes graphing data and fitted models, programming, writing for specialized and general audiences, lecturing, working with students, and combining words and pictures in different ways.

The common theme of all these interactions is that we need to consider our statistical tools in the context of our goals.

Communication is not just about conveying prepared ideas to others: often our most important audience is ourselves, and the same principles that suggest good ways of communication with others also apply to the methods we use to learn from data.

See also the description of my statistical communication and graphics course, where we try to implement the above principles.

[I'll be regularly updating this post, which I sketched out (with the help of the students in my statistical communication and graphics course this semester) and put here so we can link to it from the official course description.]

My course on Statistical Communication and Graphics

Screen Shot 2013-08-03 at 4.23.29 PM

We will study and practice many different aspects of statistical communication, including graphing data and fitted models, programming in Rrrrrrrr, writing for specialized and general audiences, lecturing, working with students and colleagues, and combining words and pictures in different ways.

You learn by doing: each week we have two classes that are full of student participation, and before each class you have a pile of readings, a homework assignment, and jitts.

You learn by teaching: you spend a lot of time in class explaining things to your neighbor.

You learn by collaborating: you’ll do a team project which you’ll present at the end of the semester.

The course will take a lot of effort on your part, effort which should be aligned with your own research and professional goals. And you will get the opportunity to ask questions of guest stars who will illustrate diverse perspectives in statistical communication and graphics.

See also the statistical communication and graphics manifesto.

[I'll be regularly updating this post, which I sketched out (with the help of the students in my statistical communication and graphics course this semester) and put here so we can link to it from the official course description.]

The Fault in Our Stars: It’s even worse than they say


In our recent discussion of publication bias, a commenter link to a recent paper, “Star Wars: The Empirics Strike Back,” by Abel Brodeur, Mathias Le, Marc Sangnier, Yanos Zylberberg, who point to the notorious overrepresentation in scientific publications of p-values that are just below 0.05 (that is, just barely statistically significant at the conventional level) and the corresponding underrepresentation of p-values that are just above the 0.05 cutoff.

Brodeur et al. correctly (in my view) attribute this pattern not just to selection (the much-talked-about “file drawer”) but also to data-contingent analyses (what Simmons, Nelson, and Simonsohn call “p-hacking” and what Loken and I call “the garden of forking paths”). They write:

We have identified a misallocation in the distribution of the test statistics in some of the most respected academic journals in economics. Our analysis suggests that the pattern of this misallocation is consistent with what we dubbed an inflation bias: researchers might be tempted to inflate the value of those almost-rejected tests by choosing a “significant” specification. We have also quantified this inflation bias: among the tests that are marginally significant, 10% to 20% are misreported.

They continue with “These figures are likely to be lower bounds of the true misallocation as we use very conservative collecting and estimating processes”—but I would go much further. One way to put it is that there are (at least) three selection processes going on here:

1. (“the file drawer”) Significant results (traditionally presented in a table with asterisks or “stars,” hence the photo above) more less likely to get published.

2. (“inflation”) Near-significant results get jiggled a bit until they fall into the box

3. (“the garden of forking paths”) The direction of an analysis is continually adjusted in light of the data.

Brodeur et al. point out that item 1 doesn’t tell the whole story, and they come up with an analysis (featuring a “lemma” and a “corollary”!) explaining things based on item 2. But I think item 3 is important too.

The point is that the analysis is a moving target. Or, to put it another way, there’s a one-to-many mapping from scientific theories to statistical analyses.

So I’m wary of any general model explaining scientific publication based on a fixed set of findings that are then selected or altered. In many research projects, there is either no baseline analysis or else the final analysis is so far away from the starting point that the concept of a baseline is not so relevant.

Although maybe things are different in certain branches of economics, in that people are arguing over an agreed-upon set of research questions.

P.S. I only wish I’d known about these people when I was still in Paris; we could’ve met and talked.