Probability and Statistics in the Study of Voting and Public Opinion

Elections have both uncertainty and variation and hence represent a natural application of probability theory. In addition, opinion polling is a classic statistics problem and is featured in just about every course on the topic. But many common intuitions about probability, statistics, and voting are flawed. Some examples of widely-held but erroneous beliefs: votes should be modeled by the binomial distribution; sampling distributions and standard errors only make sense under random sampling; poll averaging is a simple problem in numerical analysis; survey sampling is a long-settled and boring area of statistics. In this talk, we discuss some challenging problems in probability and statistics that arise from the study of opinion and elections.

I’ll be speaking at the Applied Probability and Risk Seminar, 1:10-2:10pm in 303 Mudd Hall, Columbia University.

**P.S.** I discussed work from these papers:

http://www.stat.columbia.edu/~gelman/research/published/hansen_paper_2.pdf

http://www.stat.columbia.edu/~gelman/research/published/blocs.pdf

http://www.stat.columbia.edu/~gelman/research/published/STS027.pdf

http://www.stat.columbia.edu/~gelman/research/published/gelmankatzbafumi.pdf

>>>Elections have both uncertainty and variation<<<

Is uncertainty different from variation?

What are example of phenomena (as opposed to elections) that exhibit only uncertainty & no variation. Or show only variation but no uncertainty.

Variation without (much) uncertainty: the voltage across a thermocouple as the temperature of a water bath is raised from 0C to 100C via a well calibrated heater.

Uncertainty without variation? You come upon an old covered bridge from the 1700’s and you have a heavy Mack truck. Can you drive across it safely? The strength of this bridge has remained constant for the last say 50 years thanks to good maintenance, but no one really knows if the strength exceeds that required for the truck.

No variation is only possible in the past – had you driven across the bridge yesterday at a set point in time?

The main point is that it’s possible not to know very much about something even if that thing is almost entirely constant. Like, what did Churchill have for lunch on May 18th 1941? Or what is the mean diameter of the moon averaged over the 30*24*60*60 seconds starting at Jan 1 2017 UTC and defined as the diameter of a sphere completely enclosing the moons surface.

The diameter thing is pretty well defined, and since I specified a very specific time window over which we average, it has a very precise value, but good luck finding out what it was with all the tidal forces and whatnot. Particularly if you didn’t have the kind of extreme measurement abilities we do today, like say if we did the same thing in 1930.

The DC voltage of a North American electrical outlet varies between about -160 and 160V, 60 times per second. This variation is not uncertain. If your oscilloscope doesn’t show it, it’s probably broken.

The breaking point of the old bridge is unknown, but is known not to vary measurably day to day. This is based on modest assumptions of how it was built, that it isn’t pushed beyond some parameters, and experimentally validated theory.

Can’t make the event – something about being in Australia makes it difficult.

Will your talk be published, or available on YouTube after?

Just seconding this. Any chance that those of us who aren’t in New York that day can see the talk at some point?

Hey Andrew, I’m curious if you are interested in commenting on Nassim Taleb’s latest approach to the subject: https://arxiv.org/pdf/1703.06351.pdf

Chris:

These are the comments I sent to Nassim:

Would it make sense to say “this poll has increased my posterior for candidate X winning by 0.3%”? If so, then the 73.8 communicates something, even if the …3.8 is only relative to the 74.1 from the previous day.

Jameson:

Sure, but in practice it’s all noise. Just like if I measured your height with some sort of super-precise instrument and it went from 66.23433 to 66.23415 inches tall: yes, I could say your measured height went down by 0.00018 inches, but it doesn’t really mean anything!

Thanks Andrew!

Did Nassim reply? Would be curios to read his response.

Rahul:

Nassim replied that the formal restriction to the interval (0, 1) was necessary for the purpose of pricing options, even if the probability of getting to the extremes is very low.

So what do you think about the whole idea of using a martingale process to forecast elections?

Chris:

I think what’s important about a forecasting method is what information goes into it, rather than what particular model is used. Not that setting up a model is trivial: the model needs to be flexible enough to use the information at hand, while being constrained enough to avoid overfitting.

Yea that makes sense to me. I guess I don’t understand how the data assimilation is supposed to work in NNT’s framework…

Andrew:

Interesting. I find that very non-intuitive. Shouldn’t we judge forecasting methods by their outputs rather than inputs?