## Probability and Statistics in the Study of Voting and Public Opinion (my talk at the Columbia Applied Probability and Risk seminar, 30 Mar at 1pm)

Probability and Statistics in the Study of Voting and Public Opinion

Elections have both uncertainty and variation and hence represent a natural application of probability theory. In addition, opinion polling is a classic statistics problem and is featured in just about every course on the topic. But many common intuitions about probability, statistics, and voting are flawed. Some examples of widely-held but erroneous beliefs: votes should be modeled by the binomial distribution; sampling distributions and standard errors only make sense under random sampling; poll averaging is a simple problem in numerical analysis; survey sampling is a long-settled and boring area of statistics. In this talk, we discuss some challenging problems in probability and statistics that arise from the study of opinion and elections.

I’ll be speaking at the Applied Probability and Risk Seminar, 1:10-2:10pm in 303 Mudd Hall, Columbia University.

P.S. I discussed work from these papers:
http://www.stat.columbia.edu/~gelman/research/published/hansen_paper_2.pdf
http://www.stat.columbia.edu/~gelman/research/published/blocs.pdf
http://www.stat.columbia.edu/~gelman/research/published/STS027.pdf
http://www.stat.columbia.edu/~gelman/research/published/gelmankatzbafumi.pdf

1. Rahul says:

>>>Elections have both uncertainty and variation<<<

Is uncertainty different from variation?

What are example of phenomena (as opposed to elections) that exhibit only uncertainty & no variation. Or show only variation but no uncertainty.

• Variation without (much) uncertainty: the voltage across a thermocouple as the temperature of a water bath is raised from 0C to 100C via a well calibrated heater.

Uncertainty without variation? You come upon an old covered bridge from the 1700’s and you have a heavy Mack truck. Can you drive across it safely? The strength of this bridge has remained constant for the last say 50 years thanks to good maintenance, but no one really knows if the strength exceeds that required for the truck.

• Keith O'Rourke says:

No variation is only possible in the past – had you driven across the bridge yesterday at a set point in time?

• The main point is that it’s possible not to know very much about something even if that thing is almost entirely constant. Like, what did Churchill have for lunch on May 18th 1941? Or what is the mean diameter of the moon averaged over the 30*24*60*60 seconds starting at Jan 1 2017 UTC and defined as the diameter of a sphere completely enclosing the moons surface.

The diameter thing is pretty well defined, and since I specified a very specific time window over which we average, it has a very precise value, but good luck finding out what it was with all the tidal forces and whatnot. Particularly if you didn’t have the kind of extreme measurement abilities we do today, like say if we did the same thing in 1930.

• Dzhaughn says:

The DC voltage of a North American electrical outlet varies between about -160 and 160V, 60 times per second. This variation is not uncertain. If your oscilloscope doesn’t show it, it’s probably broken.

The breaking point of the old bridge is unknown, but is known not to vary measurably day to day. This is based on modest assumptions of how it was built, that it isn’t pushed beyond some parameters, and experimentally validated theory.

2. Bryan says:

Can’t make the event – something about being in Australia makes it difficult.

• Samuel says:

Just seconding this. Any chance that those of us who aren’t in New York that day can see the talk at some point?

3. Chris Wilson says:

Hey Andrew, I’m curious if you are interested in commenting on Nassim Taleb’s latest approach to the subject: https://arxiv.org/pdf/1703.06351.pdf

• Andrew says:

Chris:

These are the comments I sent to Nassim:

1. It’s pretty irrelevant in a 2-candidate election that the proportion of votes is between 0 and 1. In the elections you’re talking about, it’s never close to 0 or 1, and if it were you’d want a different model. Given the distance of the actual votes from the (0,1) endpoints, issues such as beta distribution (mentioned on page 2) are irrelevant. Also irrelevant in a large election is that actual votes are discrete. We discuss these issues in our 1990 paper:
http://www.stat.columbia.edu/~gelman/research/published/electoral3.pdf and again in our 1994 paper:
http://www.stat.columbia.edu/~gelman/research/published/unified2.pdf
I agree that the world is full of clueless half-educated people who want to apply binomial or beta distributions in such settings, and this is annoying.

2. You will be interested in this paper contrasting the random walk and mean reversion models of elections:
http://www.stat.columbia.edu/~gelman/research/published/psq_4021.pdf
This was a reaction to an offhand comment by a political journalist who thought that the polls were a random walk. We had to explain why this was not an appropriate model.

3. Your figure 5 is hard to follow because the color scheme in your top graph is not the same as the color scheme in your bottom graph. I still don’t actually plotted in that top graph.

4. The probabilities that Nate gave are ridiculously hyper-precise. To say that someone has a 73.8% chance of winning the election is like saying that someone is 66.23415 inches tall. See here:

5. This paper might also interest you:
http://www.stat.columbia.edu/~gelman/research/published/election15Feb.pdf

• Jameson Quinn says:

Would it make sense to say “this poll has increased my posterior for candidate X winning by 0.3%”? If so, then the 73.8 communicates something, even if the …3.8 is only relative to the 74.1 from the previous day.

• Andrew says:

Jameson:

Sure, but in practice it’s all noise. Just like if I measured your height with some sort of super-precise instrument and it went from 66.23433 to 66.23415 inches tall: yes, I could say your measured height went down by 0.00018 inches, but it doesn’t really mean anything!

• Chris Wilson says:

Thanks Andrew!

• Rahul says:

• Andrew says:

Rahul:

Nassim replied that the formal restriction to the interval (0, 1) was necessary for the purpose of pricing options, even if the probability of getting to the extremes is very low.

• Chris Wilson says:

So what do you think about the whole idea of using a martingale process to forecast elections?

• Andrew says:

Chris:

I think what’s important about a forecasting method is what information goes into it, rather than what particular model is used. Not that setting up a model is trivial: the model needs to be flexible enough to use the information at hand, while being constrained enough to avoid overfitting.

• Chris Wilson says:

Yea that makes sense to me. I guess I don’t understand how the data assimilation is supposed to work in NNT’s framework…

• Rahul says:

Andrew:

Interesting. I find that very non-intuitive. Shouldn’t we judge forecasting methods by their outputs rather than inputs?