“Communication is a central task of statistics, and ideally a state-of-the-art data analysis can have state-of-the-art displays to match”

The Journal of the Royal Statistical Society publishes papers followed by discussions. Lots of discussions, each can be no more than 400 words. Here’s my most recent discussion:

The authors are working on an important applied problem and I have no reason to doubt that their approach is a step forward beyond diagnostic criteria based on point estimation. An attempt at an accurate assessment of variation is important not just for statistical reasons but also because scientists have the duty to convey their uncertainty to the larger world. I am thinking, for example, of discredited claims such as that of the mathematician who claimed to predict divorces with 93% accuracy (Abraham, 2010).

Regarding the paper at hand, I thought I would try an experiment in comment-writing. My usual practice is to read the graphs and then go back and clarify any questions through the text. So, very quickly: I would prefer Figure 1 to be displayed in terms of standard deviations, not variances. I find variances difficult to interpret, and I’m always taking mental square roots (0.09 is 0.3 squared, and so forth). Figure 3 is appealing but I don’t like the visual emphasis of the endpoints of the 95% intervals. From a Bayesian standpoint, there is nothing special about the 2.5th and 97.5th percentiles of the posterior distribution, and I think it goes against the spirit of the article to emphasize these arbitrary endpoints. I also think that, with some care, the graphs in Figures 3, 4, and 5 could be compactly re-expressed to show comparisons more effectively (as in Gelman, Pasarica, and Dodhia, 2002). Tables 2 and 3 I think are useless: why should a reader care that the 10th percentile point of the distribution for a particular probability os 0.164 or whatever? Again, this seems to me to contradict the decision-analytic focus of the applied research.

These brusque comments on display may seem peripheral but to me they are important. Communication is a central task of statistics, and ideally a state-of-the-art data analysis can have state-of-the-art displays to match.

References

Abraham, Laurie (2010). Can you really predict the success of a marriage in 15 minutes? Slate, 8 March. http://www.slate.com/articles/double_x/doublex/2010/03/can_you_really_predict_the_success_of_a_marriage_in_15_minutes.html

Gelman, Andrew, Pasarica, C., and Dodhia, R. (2002). Let’s practice what we preach: turning tables into graphs. American Statistician 56, 121-130.

1. Hear, hear! And I like the open (and highly and effectively used) commenting on statistics journal papers. We don’t do that in astronomy but we should.

2. Scott says:

Andrew, can you look into this and maybe get Nate onto the issue? There seems to be vote rigging in presidential elections that can be seen by looking at statistics.
Fraudulently, a computer program flips a percentage of votes from one candidate to another but that the percentage of votes flipped varies with the size of the precinct. The rules of the fraudulent vote flipping algorithm are:
Very small precincts don’t have any votes flipped.
The percentage of votes that are flipped is small (such as .01%) for small precincts and large (such as 5%) for large precincts with a *gradual* change in percentage “flipped”.
The reason perpetrators don’t flip as many votes in small districts is because a recount will check a random number of precincts and a smaller precinct is *more* likely to be audited because there are more of them. So if fraudulent vote flipping flips 5% of votes from Democrats to Republicans, a random recount of precincts would show a smaller error – such as 2%.
The authors of the paper show that the effect does not happen in data from some counties presumably because the perpetrators did not have access to those tabulating computers and it is not seen in democratic primaries. The authors of the study look at income and poverty rates which are highly correlated with voter choice but do not correlate with precinct size. This rules out more Republicans living in large precincts as being the cause of the anomalous data.
To find the article, do a google on the following words: vote flipping large precincts central tabulator

3. Eli Rabett says: