“Choose the data visualization that best serves your audience.”


Tian Zheng prepared the above slide which very clearly displays an important point about statistical communication.

The maps are squished to be too narrow, and the scatterplot has too many numbers on the axes (better to have income in thousands and percentages in tens), also given the numbers it seems that the data must be pretty old—but maybe that’s part of the point, that the principle of different sorts of data display is so general that it works even if the component graphs have problems.

8 thoughts on ““Choose the data visualization that best serves your audience.”

  1. That’s Mark Monmonier’s state visibility map, and it’s supposed to be like that (I went and checked). The idea is to have all the states visible while more-or-less keeping relative geographical position.

      • In R,

        library(maps)
        map(‘state.vbm’)

        The original is from Mark Monmonier and George Schnell, “The Study of Population”, Elements, Patterns, Processes. Charles E. Merrill. Columbus, OH. 1982, and it’s in Michael Friendly’s website of milestones in data visualisation.

  2. Way off topic but the “visualization” in the subject line suggested this insight into statistical blind spots might be relevant. So, my weekly fishing in Lake Google School Case Law for some reference to the ASA’s p-values statement finally paid off. It came up in the cross examination of an expert witness and here’s the relevant part of the testimony (it’s from a testosterone replacement therapy trial):

    Q. And the American Statistical Association, I think this is 2016, issued a guidance on statistical significance and p-values, correct?

    A. This article is on p-values is what you’re referring to?

    Q. And statistical significance, correct? Do you see the heading, sir, “ASA Statement on Statistical Significance and P-Values”?

    A. That’s not what mine — you’re looking — I’m sorry. You’re looking at the next page, yes. Yes.

    Q. So they issued a statement on statistical significance and p-values, and they say, “What is a p-value?” And they answer that question to make sure that’s understood. And formally a “P-value is the probability under a specific” — “specified statistical model that a statistical summary of the data would be equal to or more extreme than its observed value.” I cut off the parenthetical, but other than that, did I read it correctly?

    A. Yes.

    Q. So essentially the probability that the effect estimate that is seen in the study would be seen by chance alone, right?

    A. Yes.

    Q. So a p-value of .05 means that there’s a 5 percent chance that this finding could have been seen by chance alone, right?

    A. Yes.

    Q. A p-value of .06 means there’s a 6 percent probability that the finding could have been seen by chance alone.

    A. Yes.

    Q. And so in studies — and you talked about a 95 percent confidence interval. Often people look at 95 percent — or a 5 percent chance of a random finding, right?

    A. Science is very rigorous. It has very defined elements to it that when you publish, you’re expected to meet those standards.

    Well … it’s hard to understand know how you could put the ASA statement in front of an advocate and the expert he’s cross examining and manage to wind up with them having this sort of exchange about it … unless you statisticians are also guilty of having cast some sort of mass illusion spell; one which you’re having a very difficult time dispelling. Science is very rigorous, indeed.

  3. Is it only me who can’t figure out why the red line goes to the blue map and the blue line goes to the red map? And also if the color of the circle is meaningful and why it encloses multiple points? And why California? I think not encouraging distracting questions is an important element of good data visualization!

Leave a Reply to Thomas Lumley Cancel reply

Your email address will not be published. Required fields are marked *