Undervotes in Florida

Michael Herron pointed me to this paper by Laurin Frisina, James Honaker, Jeffrey Lewis, and himself, “Ballot Formats, Touchscreens, and Undervotes: A Study of the 2006 Midterm Elections in Florida”. Here’s the abstract:

The 2006 midterm elections in Florida have focused attention on undervotes, ballots on which no vote is recorded on a particular contest. This interest was sparked by the high undervote rate—more than 18,000 total undervotes out of 240,000 ballots cast—in Florida’s 13th Congressional District race, a race that, as of this paper’s writing, was decided by 369 votes. Using precinct-level voting returns, we show that the high undervote rate in the 13th Congressional District race was almost certainly caused by the way that one county’s (Sarasota’s) electronic touchscreen voting machines placed the 13th Congressional District race above the Florida Governor election on a single screen. We buttress this claim by showing that extraordinarily high undervote rates were also observed in the Florida Attorney General race in Charlotte and Lee Counties, places where that race appeared below the Governor race on the same screen. Using a statistical imputation model to identify and allocate excess undervotes, we find that there is a roughly 90 percent chance that the much-discussed Sarasota undervotes were pivotal in the very close 13th Congressional District race. Greater study and attention should be paid to how alternatives are presented to voters when touchscreen voting machines are employed.

They really did an impressive amount of work considering that the election just occurred. I haven’t tried to read the paper in great detail, but skipping to their model: I don’t really understand what’s happening on the bottom of page 34, in particular I don’t know what their “full-information liklelihood function” is (or what they mean by “full-information likelihood”). I’m also puzzled by their references to Shafer (1997) and King et al. (2001) since those papers use a multivariate normal model, and that doesn’t seem so relevant here). I agree that this is a missing-data problem, but I’d think it might make more sense to have a three-stage model: first, the probabilty of voting in that race at all; second, the probability of intending to vote for either of the two candidates; third, the probability of accidentally casting the ballot wrongly. Once the model had been fit, I’d then like to see estimates of all these probabilities; this would allow a more coherent story, I think.

Finally, I’ll be brief with my graphcial suggestions: Figure 3 should be a lineplot (that is, a graph with three lines, also could be called a parallel coordinate plot), then the lines could be labeled directly, no need for these ugly bar labels (Desoto appears to be a special case; it’s data could be described in the caption); Figure 4 can be done similarly, and can be put side by side with Figure 3 for better comparison; Figure 5 has information on 95% confidence bounds, but I don’t see in the text where these actually come from (also, the dots could be slightly smaller to reduce issues of overplotting, also I’d set both the x and y-axes to go exactly to zero (it’s a little disconcerting that the axes go negative), also I’d put the 2 axes on a common scale and label the axes identically (0, 10%, 20%, 30%) without the extra tick marks that are currently on the x-axis); Table 1 is slightly confusing partly because of replications: it should be listed by county, not district (with districts indicated as a column), which would reduce the number of rows by 2, also Table 3 should be folded into Table 1, I think; Table 2 should be a graph (or, if a table, the data should be premultiplied by 100 so that they’re in percentages (so that all the coefs except for “Senate Undervote Rate” will be muliplied by 100 and be easier to read), and all ests and se’s should be rounded ot the nearest percent (that’s right, we don’t need ot know that an se is 0.00482!); Figure 6 should be rotated 90 degrees so that you can read the county names, also there are way too many tick marks here, you can just do 0, 5%, 10%; Figure 7 is just ugly (I’d prefer using my “better than a boxplot” quantile scatterplot, unfortunately I haven’t written that up yet so I can hardly blame the authors for not using it here; Figure 8 should be a line plot (see comment on Figure 3 above); Figure 9 should be little dots (not circles), and axes should go from 0 to 100% (not from below 0 to above 100%), and “Voteshare” should be “Vote share”; Figure 10 should be combined with Figure 9 in a 2×2 grid of plots; Table 4 should, at the very least, round off the conf interval bounds to the nearest hundred votes, also round the probability to 90% (rather than 89.6%).