Don’t use the binomial distribution to model vote counts

At his fun and informative website, Sam Wang writes:

The right tool for thinking about this is the statistics of the binomial distribution, which describes the distribution of all possible outcomes in a two-choice situation with fixed probability p.

I know that many people think this, but after years of work in this area I have concluded that the binomial distribution is essentially never appropriate for studying elections.

The key problem of the binomial distribution is that votes are not independent. In particular, there are national, regional, statewide, and local swings. As a result, any distribution based on assuming independence doesn’t really mean anything at all.

For more on this point from a theoretical perspective, see here, and for an empirical perspective, see here.

If you are interested more specifically in models for errors in vote counting, you might want to take a look as some of the work of Stephen Ansolabehere, Michael Herron, and Walter Mebane, political scientists who have looked in to empirical error rates.

Regarding the substance of Wang’s remarks, I pretty much agree that the real concern here is fairness, and for that purpose what’s important is following a consistent rule. But the rules themselves can be contested (recall Bush v. Gore) and so I think ultimately you have to try to estimate the intentions of the people who actually voted.

2 thoughts on “Don’t use the binomial distribution to model vote counts

  1. What about in a hierarchical model, where p is a model of covariates? You might be able to assume independence conditional on levels of some of these multiscale factors, no?

Comments are closed.