We haven’t had one of these in awhile, having mostly switched to the “chess trivia” and “bad p-values” genres of blogging . . .

But I had to come back to the topic after receiving this note from Raghuveer Parthasarathy:

Here’s another bad graph you might like. It might (arguably) be even worse than the “worst graphs of the year” you’ve blogged about, since rather than being a poor representation of data, it is simply the plotting of a tautology that mistakenly gives the impression of being data. (And it’s in Nature.)

Parthasarathy explains:

On the vertical axis we have the probability of being Type 2 Diabetic (T2D). On the horizontal axis we have the probability of being normal. There’s a clear, important trend evident, right? No! The probability of being normal is trivially one minus the probability of being T2D! The graph could not possibly be anything other than a straight line of slope -1. (For the students out there: the complete lack of scatter in the graph is a strong hint of something wrong.) What about the colors? They assign the data points for people with a > 50% probability of being T2D to be red, and the opposite to be green. The graph is simply plotting a tautology, that the probability of x is one minus the probability of not-x, together with a color scheme for labeling x. Paraphrasing Tufte, it has an information-to-ink ratio of approximately zero.

Not quite zero: what we seem to have here is a highly inefficient two-dimensional multicolor display of a one-dimensional set of 49 numbers, using dots that are so blurry that we can’t actually get much of a sense of their distribution. All joking aside, I’m guessing this graph would be much better if the x-axis were used for some relevant continuous variable (for example, people’s ages) and the colors used for some discrete variable (for example, some other indicator of health status).

Parthasarathy does add: “I’ll stress that the study itself is fascinating.” So, just to be clear, he’s criticizing the graph, not the underlying research.

I’d be very wary of trusting the abilities of anyone associated with this paper (reviewers included) simply because how can you get such a straight line and not for a moment suspect something’s wrong? Beats me!

To give the benefit of the doubt, I assume the point of the graph is the *distribution* of the dots along the line. That should be a histogram though…

Mutatis mutandis

The information-to-ink ratio would’ve been higher if they’d captioned it “For those of you who don’t know what a plot of y=1-x looks like…” and drawn a line instead of those damn polka dots. Oh, but wait, maybe the dots are supposed to be uncertainty ellipses!

Finally a conclusive proof that the way to increase the proportion of people who don’t have diabetes is to reduce the proportion who do.