Poll aggregation and election forecasting

At the sister blog, Henry writes about poll averaging and election forecasts. Henry writes that “These models need to crunch lots of polls, at the state and national level, if they’re going to provide good predictions.” Actually, you can get reasonable predictions from national-level forecasting models plus previous state-level election results, then when the election comes closer you can use national and state polls as needed. See my paper with Kari Lock, Bayesian combination of state polls and election forecasts. (That said, the method in that paper is fairly complicated, much more so than simply taking weighted averages of state polls, if such abundant data happen to be available. And I’m sure our approach would need to be altered if it were used for real-time forecasts.)

Having a steady supply of polls of varying quality from various sources allows poll aggregators to produce news every day (in the sense of pushing their estimates around) but it doesn’t help much with a forecast of the actual election outcome. (See my P.S. here.)

Since 1992 (when Gary and I did our research indicating that poll movements are mostly noise), I’ve thought that that repeated-polling business model of news reporting was unsustainable, but it’s only been getting worse and worse. Maybe Henry is right that recent developments will push it over the edge.

One reason that political scientists have not generally been doing poll aggregation is that, at least for the general election for president, there’s little point in doing so–or, to put it another way, just about any averaging would do fine, no technology needed. Recall that Nate made his reputation during the 2008 primary elections. Primaries are much harder to predict for many reasons (less lead time, candidates have similar positions, no party labels, unequal resources, more than two serious candidates running, etc), and being sophisticated about the polls makes much more difference there.

4 thoughts on “Poll aggregation and election forecasting

  1. Even if each individual poll is very noisy, don’t you need to aggregate them to determine the underlying signal? The way I think about it — and maybe I am missing your point — if the individual polls are noisy, then aggregation is *more* necessary, not less, right? Not making a point, just asking for clarification.

Comments are closed.