“Communication is a central task of statistics, and ideally a state-of-the-art data analysis can have state-of-the-art displays to match”

The Journal of the Royal Statistical Society publishes papers followed by discussions. Lots of discussions, each can be no more than 400 words. Here’s my most recent discussion:

The authors are working on an important applied problem and I have no reason to doubt that their approach is a step forward beyond diagnostic criteria based on point estimation. An attempt at an accurate assessment of variation is important not just for statistical reasons but also because scientists have the duty to convey their uncertainty to the larger world. I am thinking, for example, of discredited claims such as that of the mathematician who claimed to predict divorces with 93% accuracy (Abraham, 2010).

Regarding the paper at hand, I thought I would try an experiment in comment-writing. My usual practice is to read the graphs and then go back and clarify any questions through the text. So, very quickly: I would prefer Figure 1 to be displayed in terms of standard deviations, not variances. I find variances difficult to interpret, and I’m always taking mental square roots (0.09 is 0.3 squared, and so forth). Figure 3 is appealing but I don’t like the visual emphasis of the endpoints of the 95% intervals. From a Bayesian standpoint, there is nothing special about the 2.5th and 97.5th percentiles of the posterior distribution, and I think it goes against the spirit of the article to emphasize these arbitrary endpoints. I also think that, with some care, the graphs in Figures 3, 4, and 5 could be compactly re-expressed to show comparisons more effectively (as in Gelman, Pasarica, and Dodhia, 2002). Tables 2 and 3 I think are useless: why should a reader care that the 10th percentile point of the distribution for a particular probability os 0.164 or whatever? Again, this seems to me to contradict the decision-analytic focus of the applied research.

These brusque comments on display may seem peripheral but to me they are important. Communication is a central task of statistics, and ideally a state-of-the-art data analysis can have state-of-the-art displays to match.

References

Abraham, Laurie (2010). Can you really predict the success of a marriage in 15 minutes? Slate, 8 March. http://www.slate.com/articles/double_x/doublex/2010/03/can_you_really_predict_the_success_of_a_marriage_in_15_minutes.html

Gelman, Andrew, Pasarica, C., and Dodhia, R. (2002). Let’s practice what we preach: turning tables into graphs. American Statistician 56, 121-130.

4 thoughts on ““Communication is a central task of statistics, and ideally a state-of-the-art data analysis can have state-of-the-art displays to match”

  1. Andrew, can you look into this and maybe get Nate onto the issue? There seems to be vote rigging in presidential elections that can be seen by looking at statistics.
    Fraudulently, a computer program flips a percentage of votes from one candidate to another but that the percentage of votes flipped varies with the size of the precinct. The rules of the fraudulent vote flipping algorithm are:
    Very small precincts don’t have any votes flipped.
    The percentage of votes that are flipped is small (such as .01%) for small precincts and large (such as 5%) for large precincts with a *gradual* change in percentage “flipped”.
    The reason perpetrators don’t flip as many votes in small districts is because a recount will check a random number of precincts and a smaller precinct is *more* likely to be audited because there are more of them. So if fraudulent vote flipping flips 5% of votes from Democrats to Republicans, a random recount of precincts would show a smaller error – such as 2%.
    The authors of the paper show that the effect does not happen in data from some counties presumably because the perpetrators did not have access to those tabulating computers and it is not seen in democratic primaries. The authors of the study look at income and poverty rates which are highly correlated with voter choice but do not correlate with precinct size. This rules out more Republicans living in large precincts as being the cause of the anomalous data.
    To find the article, do a google on the following words: vote flipping large precincts central tabulator

  2. Scott,

    I was curious, so I downloaded the NH 2012 primary data and plotted Romney share vs total votes in each precinct. There’s a clear correlation between precinct size and the share of the Romney vote total (very clear if you do a semilog plot). Just to give a flavor: Romney ran 9 points better in the largest 154 precincts than in the smallest. So it is not a small effect in NH.

    I also looked at the 2008 NH general election, and saw no such trend in the McCain vote share. I started to look at the 2008 primary data, but the spreadsheet has a very annoying format so I gave up for the time being.

    If they are right that this shows up nationwide in the 2012 primary, but did not show up in 2008 (or in other general elections), then it sounds like an interesting puzzle for a political scientist to solve.

    I’m highly skeptical that fraud is the answer. The authors are claiming a nationwide effect affecting *many* precints; not a small matter to organize and keep secret. And the risk/reward just doesn’t make sense. If fraud on that scale became public it would destroy the political party responsible. I just can’t see it.

Comments are closed.