Evaluating election forecasts

Nadia Hassan writes:

Nate Silver did a review of pre-election predictions from forecasting models in 2012. The overall results were not great, but many scholars noted that some models seemed to do quite well. You mentioned that you were interested in how top-notch models fare.

Nate agreed that some were better, but he raised the question of lucky vs. good with forecasters:
“Some people beat Vegas at roulette on any given evening. Some investors beat the stock market in any given month/quarter/year, and yet there is (relatively) little evidence of persistent stock-picking skill, etc, etc.”

The other thing is you did a paper with Wang on the limits of predictive accuracy. Many election models are linear regressions, but the point seems pertinent.

Election forecasting is seen by some as a valuable opportunity social science theories over time. It does seem like one can go wrong by just comparing pre-election forecasts to outcomes. How can one examine predictions sensibly considering these issues?

My reply: One way to increase N here is to look at state-by-state predictions. Here it makes sense to look at predictions for each state relative to the national average, rather than just looking at the raw prediction. To put it another way: suppose the state-level outcomes are y_1,…,y_50, and the national popular vote outcome is y_usa (a weighted average of the 50 y_j’s). Then you should evaluate the national prediction by comparing to y_usa, and you should evaluate state predictions of y_j – y_usa for each j. Otherwise you’re kinda double counting the national election and you’re not really evaluating different aspects of the prediction. You can also look at predictions of local elections, congressional elections, etc.

And always evaluate predictions on vote proportions, not just win/loss. That’s something I’ve been saying for a long long time (for example see this book review from 1993). To evaluate predictions based on win/loss is to just throw away information.

11 thoughts on “Evaluating election forecasts

  1. if you are looking at forecasting accuracy and you have access to (y_1 , … , y_50), why are you also interested in y_usa? Isn’t y_usa merely a function f(y_1, … , y_50)? Or are the y_j different elections?

      • Thanks David and Daniel. Please ignore if you find this comment petty.
        If we have 50 states and make 50 predictions y_j_pred and compare to y_j_actual. We have 50 comparisons. We also have y_usa_pred, but that is a function of the states predictions y_usa_pred = f(y_1_pred, …, y_50_pred). This I only assumed, is that the case?
        If yes, y_j_pred – y_usa_pred is y_j_pred – f(y_1_pred, …, y_50_pred) which we compare to y_j_actual – f(y_1_actual, …, y_50_actual).
        so there are really 50 comparisons, and subtracting y_usa_pred only adds to the variance of the prediction.
        My guess is that, well, I am missing something about how polls are made or the function f(). What am I missing?

        • First, y_usa_pred has some of its own data– many polls in the US are national, not taken by state. The national polls thus have randomly varying exposures to the underlying y_j that are different from the proportions in the actual election. For example, Wyoming only has 0.15% of the US population. A national poll with a sample of 400 would thus have an expectation of 0.6 Wyoming voters, but would in reality sample a whole number of voters (i hope).

          However it is almost certain that the errors on the y_j estimates are going to be highly correlated with each other, so comparing y_1_pred with y_1 and y_2_pred with y_2 is almost certainly going to give the same sign. For example consider if for all j y_j = y_j_pred + k_j * ep_usa + ep_j, where ep_usa has some significant variance and ep_usa and all the ep_j are independent from each other. The prediction errors k_j * ep_usa + ep_j are thus all correlated. If ep_usa takes a significant hit, all the y_j will appear off, even if the ep_j are small. Measuring y_usa – y_j removes the exposure to ep_usa (if k_j = 1 for all j, i.e. assuming an average exposure to the US-wide error), leaving the independent ep_j sources of error.

        • Ok, so there are data sources that are only national, not every prediction is an aggregation of states.
          Good point about the correlated shocks. Thanks.

        • I think andrew is suggesting that we compare y_usa_pred to y_usa_actual… that’s pretty straightforward thing to do.

          Then, he’s suggesting to compare y_j_pred-y_usa_pred to y_j_pred – y_usa_actual, asking the question “how well did our model predict the actual difference between y_j and y_usa”

          One question is whether your “f” is a fully deterministic function whose form is known exactly. For example, if it’s a weighted average weighted by the actual number of voters in each state, and the actual number of voters isn’t know, then it is itself an unknown, possibly with multiple parameters involved, and provides another source of variation that needs to be assessed.

        • So, in a linear least squares algebra problem, the issue you’re bringing up is critical. You can’t estimate N things when the dimension of the space is N-1. But in a Bayesian analysis it’s not quite the same. First off, realistically f probably has unknown parameters such as the number of people who will actually vote in each state. Second, even if that’s not the case, Bayesian analyses can partition the uncertainty among several factors, the posterior for the parameters will then be correlated but it’s not a “divide by zero” problem like in the linear algebra case. For a very simple example, suppose you observe

          Y[i]=A*Bact[i]+yerr[i]

          and you have Bmeas[i] = Bact[i] + Berr[i] measurements of the B with error

          You don’t know what A is and you measure B with uncertainty so you don’t know what B is exactly. If you have N measurements, there are N errors in the B, and 1 uncertain value for A, so you have N+1 uncertain things and only N measurements.

          But, perhaps you have some information about A and about the size of the uncertainty in the measurement of B. You can easily write down a posterior distribution in Stan:

          A ~ normal(A_estimate,A_uncertainty); // from background info
          Bmeas ~ normal(Bact,Berr_sigma); // from info about B measurement instrument
          Y[i]~ normal(A*Bact[i],y_err_sigma); // from model equation

          And get posterior samples for A and all the Bact[i] values, which will have correlations, but not be undefined.

        • Imagine for example that you “know” A to 5 decimal places, then A will be tightly constrained, and the B values will “soak up” the uncertainty. On the other hand if you only know A to +- 10%, then there will be a lot of A wiggling and that A wiggling will imply all the B values wiggle the other way in some sense.

Leave a Reply

Your email address will not be published. Required fields are marked *