The other day we had a fun little discussion in the comments section of the sister blog about the appropriateness of stating forecast probabilities to the nearest tenth of a percentage point.

It started when Josh Tucker posted this graph from Nate Silver:

My first reaction was: this looks pretty but it’s hyper-precise. I’m a big fan of Nate’s work, but all those little wiggles on the graph can’t really mean anything. And what could it possibly mean to compute this probability to that level of precision?

In the comments, people came at me from two directions. From one side, Jeffrey Friedman expressed a hard core attitude that it’s meaningless to give a probability forecast of a unique event:

What could it possibly mean, period, given that this election will never be repeated? . . . I know there’s a vast literature on this, but I’m still curious, as a non-statistician, what it could mean for there to be a meaningful 65% probability (as opposed to a non-quantifiable likelihood) that a one-time outcome will occur. If 65.7 is too precise, why isn’t 65? My [Friedman] hypothesis is that we’re trying to homogenize inherently heterogenous political events to make them tractable to statistical analysis. But presidential elections are not identical balls drawn randomly from an urn, and they are not lottery numbers randomly picked by a computer.

This one was in my wheelhouse, and I responded:

Probabilities can have a lot of meaning, even for an event that will never be repeated. For example, suppose you want to make a decision where the outcome is contingent on who wins the election. It can make sense to quantify your uncertainty using probability. Neumann and Morgenstern wrote a book about this! But at some point the quantification becomes meaningless. “60%,” sure. “65%,” maybe. “65.7%,” no way. . . . The different events being analyzed (in this case, elections) are not modeled as identical balls drawn from an urn. A better analogy you might keep in your mind is business forecasting. You might have some uncertainty about the price of oil next year, or whether the price will exceed $X a barrel. It’s an uncertain event but you know something about it. Then you get some information, for example a new oil field is discovered or a new refinery somewhere is built. This changes your probability. A number such as 65% could express this. Similarly, I might describe someone as being 5 feet 8 inches tall, or even 5 feet 8 1/2 inches tall, but it would be silly to call him 5 feet 8.34 inches tall, given that his height changes by a large fraction of an inch during the day.

From the other direction, commenter Paul wrote:

I disagree with Andrew that 65.7% is too precise. One reason is that intrade prices are quoted to 3 digit precision — viz., ranging from $0.00 to $10.00, with bid-ask spreads as low as 1 cent. So, for example, if I believed Nate’s formula was more accurate than Intrade, and the current quote was $6.56 bid, $6.57 ask, I would like Nate to provide more precision in order to determine whether or not to buy ‘Barack Obama to be re-elected on 2012′. Even with today’s low interest rates, I would still need Nate to forecast at least a 65.74% probability in order to believe purchasing ‘Obama 2012′ at $6.57 would outperform a 0.90% money market rate over the next 18 days.

Paul makes a good point about market pricing. I can see why Intrade would want that precision. But I can’t see the point of Nate giving probabilities to that level of precision, given that he’s not working for Intrade or for a trader. Or, to put it another way, I can can see why Nate might want to report fractional percentage-point probabilities on the New York Times website: more detail means more fluctuations which means more news. (In the above image, that 65.7% was listed as “-2.2 since Oct 10.” I don’t think this “-2.2” means very much but it represents a change, i.e. news.) But from a statistical standpoint I don’t see the value.

**Crunching the numbers**

Let’s do a quick calibration. Currently Nate gives Obama a 67.6% change of winning, with a 50.0% to 48.9% lead in the popular vote. That’s a 50.55% share of the 2-party vote. Nate’s page doesn’t give a standard error, but let’s suppose that his forecast for Obama’s popular vote share is a normal distribution with mean 50.55% and standard deviation 1.5%. That is, there’s a 95% chance that Obama will get within 47.5% and 53.5% of the 2-party vote. That seems in the right ballpark to me. Then the probability Obama will win the popular vote is pnorm((50.55-50)/1.5) = 0.643. Not quite Nate’s 65.7%; I can attribute the difference to the electoral college.

This is getting interesting. A big lead in the probability (65%-35%) corresponds to a liny lead in the vote (50.5%-49.5%). Now suppose that our popular vote forecast is off by one-tenth of a percentage point. Given all our uncertainties, it would seem pretty ridiculous to claim we could forecast to that precision anyway, right? If we bump Obama’s predicted 2-party vote share up to 50.65%, we get a probability Obama wins of pnorm((50.65-50)/1.5) = 0.668. If we ratchet Obama’s expected vote share down to 50.45%, his probability of winning goes down to pnorm((50.45-50)/1.5) = 0.618.

Thus, a shift of 0.1% in Obama’s expected vote share corresponds to a change of 2.5 percentage points in his probability of winning.

Now let’s do it the other way. If Obama’s expected vote share is 50.65%, his probability of winning is 0.6676 (keeping that extra digit to avoid roundoff issues). If his probability of winning goes up by 0.1 percentage points, then his expected percentage of the two-party vote must be qnorm(0.6686,50,1.5) = 50.654. That’s right: a change in 0.1 of win probability corresponds to a *0.004 percentage point share* of the two-party vote. I can’t see that it can possibly make sense to imagine an election forecast with that level of precision. Even multiplying everything by ten—specifying win probabilities to the nearest percentage point—corresponds to specifying expected vote shares to within 0.04% of the vote, which remains ridiculous.

Really, I think it would be just fine to specify win probabilities to the nearest 10%, which will register shifts of 0.4% in expected vote share. Probabilities to the nearest 10%: if it’s good enough for the National Weather Service, it’s good enough for me.

P.S. Just to emphasize: I think Nate’s great, and I can understand the reasons (in terms of generating news and getting eyeballs on the webpage) that he gives probabilities such as “65.7%.” I just don’t think they make sense from a statistical point of view, any more than it would make sense to describe a person as 5 feet 8.34 inches tall. Nate’s in a tough position: on one hand, once you have a national and state-level forecast, there’s not much you can say, day-to-day or week-to-week. On the other hand, people want news, hence the pressure to report essentially meaningless statistics such as a change in probability from 65.7% to 67.6%, etc.

P.P.S. I think the above calculations are essentially valid even though Nate’s forecast is at a state-by-state level. See my comment here.

To see this in another way, imagine that your forecast uncertainty about the election is summarized by 1000 simulations of the election outcome, that is, a 1000 x 51 matrix of simulated vote shares by state. If Pr(Obama wins) = 0.657, this corresponds to 657 out of 1000 simulations adding up to an Obama win. Now suppose there is a 1% shift in win probability, then this bumps 657 up to 667. What shift in the vote would carry just 10 out of 1000 simulations over the bar? Given that vote swings are largely national, it will come to approximately 0.04% (that is, 4 hundredths of a percentage point).

I don’t think you can bootstrap the national popular vote information into an election probability. Nate’s probability is based on state-by-state simulations in which the “swing” states play the dominant role, and there the state-by-state polls are the important ones, not the national polls.

Bill:

It goes like this: There’s a national popular vote forecast which can be broken down into state-by-state popular vote. Swings in opinion are largely happening on a national level, with all states tending to go up and down together. Indeed, the mapping of popular vote to electoral vote is deterministic, but given that (a) if either candidate wins 51% or more of the popular vote, he will very likely win in the electoral college, and (b) any reasonable uncertainty in the popular vote share is more than 1 percentage point, I think my rough mapping of national vote to probability captures the essential uncertainty. I agree that the details are only approximate (I’ve asked Nate for his state-by-state simulations but haven’t heard back from him) but I can’t see the basic scenario changing much, and I think that any changes in win probabilities of 0.1% or even 1% necessarily arise from tiny, deep-in-the-noise changes in expected vote shares.

I understand your point, Andrew. But the question was about what is meant by Nate’s calculations.

Nate pointed out the same thing that I did in his blog today:

http://fivethirtyeight.blogs.nytimes.com/2012/10/22/oct-21-uncertainty-clouds-polling-but-obama-remains-electoral-college-favorite/

Somewhat similar to what Bill said, and re: this: “That’s right: a change in 0.1 of win probability corresponds to a 0.004 percentage point share of the two-party vote. I can’t see that it can possibly make sense to imagine an election forecast with that level of precision.” >> I get what you’re trying to do, but I think it doesn’t make sense when you’re talking about forecasts like FiveThirtyEight’s that are weighted towards the state polls. The probability of actually winning the election hinges on getting past a 50% threshold in a small number of states. If you just had national polling and had to extrapolate state vote shares (and thus who wins each state’s electors) from there, then your relationship between shifts in the national vote and the electoral college outcome would make sense.

Brett:

No, because shifts are happening in all the states. See my paper with Kari Lock for more discussion of this general point. Or, to put it another way, I don’t think it makes sense to tiny fractions of a percentage point in Ohio, either.

There’s another way of looking at the forecasts. Suppose the election is a statistically dead heat. Then, a new poll comes out that favors Romney by 1% point, what effect does that have on his probability of winning? The way the news currently treats it, people may think that Romney’s chances will skyrocket. However, with Nate’s forecast, it changes slightly.

Another way of looking at this is that suppose we forced Nate to give estimates at the 5% level. One day his forecast is 65% in favor of Obama. This does not change much from week to week. Then suddenly, his forecast shoots up to 70%. That seems like a seismic shift in the probability of winning, even though the model may only show a slight shift (i.e. moving from 67.4% to 67.5%).

Further, if you click on his electoral vote graph, you will see that he has what looks to be a 95% confidence band around his estimates, which seem very large (currently stands at +/- 66 electoral votes). That is quite humbling.

The only issue I have with these very precise probability estimates would be if Nate is claiming a very high degree of confidence and is over-fitting his model. I do not know the model, so I cannot tell you, but +/- 66 electoral votes seams reasonable to me.

Jonathan:

I agree. I have no problem with Nate’s forecasts. It just seems silly for me to him to report meaningless precision such as “65.7%,” just as, for example, I would think it silly to report someone’s height as “5 feet 8.34 inches, plus or minus 0.3 inches.”

Every event is just as unique as this election. The only way you can fool yourself into believing in ‘repeated trials’ is if you ignore almost the entire Universe when specifying the trial. If you take the position of the moon into consideration for example, each roll of the dice is a unique event.

So to answer the question “What could it possibly mean, period, given that this election will never be repeated?” how about this for an answer:

There are many many states of the universe compatible with the information we have about the election. 65% of those states evolve into an Obama win.

I agree. When others confront me with this I simply say that 65 percent of the time that observable conditions have been like this before, the Obama-like character wins. Turned the other way, the Jeffrey Friedman approach means you can’t predict the probability of rain tomorrow. After all, tomorrow has never occurred before and will only occur once. What we mean is that given our meteorological understanding, our current information predicts rain, and it’s wrong 35 percent of the time. (This ignores the fact that weather forecasters never predict 50 percent and that an alternative explanation is rain over 65 percent of the area we’re bradcasting the forecast to.)

That’s not quite what I was saying. Since this gets at the core of the Frequentist fallacy it’s worth going over in detail.

Suppose I have knowledge “K” about the universe and there is a set of states “S” compatible with K. Saying that 65% of the states in S lead to outcome O is NOT the same as saying “65% of the time that we have knowledge K we get the outcome O”. The later statement requires a much stronger assumption which is only occasionally true.

Each time we prepare a trial (i.e. each time we have knowledge “K” about the universe) then the universe will be in some true state s (an element of S). So the additional assumption needed is the following: Upon repeated trials s must evenly sprinkle about S. If this happens then saying “Outcome O occurs for 65% of the states in S” will then indeed translate into “Outcome O occurs with frequency 65%”.

But this is quite a strong assumption to make about the time evolution of the universe and in general it isn’t true. Typically what happens is that there are additional constraints which we don’t know about and which restrict s to a subset of S. Because of those constraints the actual frequency of outcomes O that you will observe will no longer be 65% even though 65% of the states compatible with K will lead to O.

That’s what I get for speaking loosely in a blog comment. I agree completely.

Sorry Jonathan, I didn’t mean to knickpick your comment. Just wanted to clarify for others.

I’m sorry, Entsophy, I have to nitpick your knickpicking. It’s about picking nits (out of someone’s hair, for example), not picking Knicks (a more doomed and Sisyphusian exercise, I am told).

I have a copy of some standardized test results taken in middle school. There were 15 categories all total. I was in top 96-99% of the population in every category except spelling which I scored a 30%.

I’m sorry, Jacob, but I have to nitpick your knickpick nitpick. The word is ‘sisyphean’.

Jonathan,

I think your point is interesting, but perhaps I’m lost in the abstruseness. As I interpret your assertion, the statistical claim that “65 percent of the time, K leads to O” only holds if we can also claim to have a sense of all states s ∈ S. Put another way, just because 65 percent of states ‘s’ produce outcome ‘O’, this statistic is meaningless if we are only drawing ‘s’ from a subset of S. If we only observe a subset, outcome ‘O’ may occur in 65 percent of the subset, but would occur some unknown percentage of the rest of the set. Is that approaching your point?

If not please correct me. If it’s somewhat close, can you suggest some background sources for the point. It’s interesting, but my statistics training, to this point, doesn’t approach this level of abstraction.

I think you were speaking to me. Let me do the best I can at an explanation: Saying “65% of the states compatible with knowledge K lead to outcome O” is not meaningless. One way to probe the meaning of this is to use a repeated trial. We create a set of situations where K is true and observe O. This is not the only way to examine this claim. Indeed it may not be possible to do at all (as in the case of this election) or if possible, it may not be the best way to examine it.

But if we do try to use a repeated trial we are not guaranteed to get the result “% of S that leads to O = % of times we get O in a repeated trial”. To make the left side of this equal the right side requires an additional very very strong assumption about how the universe evolves (repeated trials are really just samples of the universe at different points in it’s evolution).

Maybe to emphasize the difference, we can think about an extreme case. Suppose that each time we know K, there is in an additional unknown constraint which forces a the state to be in a very special tiny subset of S. An example of such a constraint might be “Energy levels can only take on certain values”. Or in the context of an election “There is massive voter fraud which guarantees a certain outcome”. For this particular tiny subset of S outcome O always occurs. Even though S might be a very large set (especially if our knowledge K doesn’t include much relevatn information), when we actually do the repeated trial O will occur 100% of the time.

Or we could turn this around. Suppose we just assume there are no such additional constraints present (we assume no massive electoral fraud for example). Or we assume that on repeated trials “s explores S” completely. We predict that in repeated trials we should get 65% of the outcomes to be O. Then when we get 100% outcomes as O. The only way we could get such a wrong prediction is if our assumptions were wrong, i.e. there was some additional constraint we didn’t know about. So we’ve just learned something new. We can possibly use this information to update our model and get better predictions. Gelman might call this a posterior predictive check or something. This is also basically how quantum mechanics was discovered.

Actually, no. You are extrapolating from past elections to tell us about an unknown universe of future elections. Given Hume’s assumption of the uniformity of nature, that’s OK in natural science: we are assuming that water molecules and other meteorological factors will keep behaving the same way forever. But my point on the other blog was that political behavior isn’t like that, since it depends on each agent’s interpretation of the significance of a unique constellation of current events. I gave the example of whether voters will blame the incumbent for a bad economy, as they blamed Hoover, or whether they will credit the incumbent for doing his best, as they did FDR. That will depend on what information they get and how they interpret it; the interpretations will depend on what information they’ve gotten in the past and how they’ve interpreted it; etc. If we were omniscient we could read each other’s minds and predict each other’s interpretations and thus behavior, but we aren’t, so we can’t.

In short, I am making Popper’s argument in The Poverty of Historicism: human ideas are unpredictable. Only if you take the ideas out of political behavior would it become conceivable that you’d have the slightest idea whether the factors that go into Silver’s model will be applicable, and to the same quantitative degree, this year, let alone in a thousand years. That would be like forecasting the weather if water molecules could wake up tomorrow morning and change their minds about how to behave.

Jeffrey,

I think were talking about two different things here. We’re only talking about what the “65.7% could possible mean”, not whether the 65% was accurate. Gelman’s point (I believe) is that the number can be meaningful at least within some model, but the chances the models themselves are error is so high it’s ridiculous to give that kind of precisions. Another way to express it is “the known-unknowns may give an answer of 65.7% but the uknown-unkowns will almost certainly change this answer”. If that was Gelman’s point than I agree completely ( and so do most people).

I was objecting to the notion that this number couldn’t possibly be meaningful just because some Frequentist couldn’t give it a meaning. Well I gave it a meaning clearly in one sentance. So Frequentists might say this meaning is misguided or useless or whatever, but at least they can stop claiming that it’s meaningless.

Entsophy–I think you’re right. If you go back to the MonkeyCage discussion, I was not disagreeing with Gelman, but with Silver. I didn’t mean to invoke frequentism as vs. Bayesianism, although I unwisely used balls in an urn to convey my feeling that there is no way to predict political behavior (with precision).

To me, the interesting question is whether statisticians should be in the business of trying to predict the future, and (through the precision of their forecasts) conveying the impression that they know what they’re doing. This enterprise is (in language you suggest) misguided and worse than useless.

Sure, people aren’t water molecules. But they behave in reasonably regular ways, and it’s much easier to predict aggregate behavior than individual behavior. (Come to think of it, we’re only predicting aggregate water drop behavior, not the behavior of each individual drop.) You can complain all you like about my model, but if 65 percent of my 0.65 predictions are right, and 35 percent of my 0.35 predictions are right, ie, if functioning as a predicting machine, my error rate is what I predict it will be across a number of predictions, then it is not silly to call the metric “probability,” since, as near as we can tell, it obeys all of Kolmogorov’s axioms. Now we’ll never prove that I have derived a probability and that I’m not just a lucky guesser (and unlucky, in exactly the right proportions), but there’s nothing about people, as opposed to raindrops, that factors into that all.

Football is played by human beings, and every game dynamically builds on the next one, but you can make an accurate betting line (which represents a probability) without reading anybody’s mind.

Jonathan–No oddsmaker would presume to predict next year’s outcomes. The odds are constantly readjusted based on what has just happened–actual behavior, every week. Moreover, the rules of engagement are strict, and the role of interpretation in action on the football field (as opposed to instinctual behavior, muscular strength, and so on) is minuscule compared to the information-and-interpretation war that is politics. When a new coach introduces a new strategy, or a new player joins a team, all bets are off.

Thus, when you claim that people behave in “reasonably regular ways,” I would say you’re begging the question by conflating different types of behavior. As Popper pointed out, people regularly innovate–it’s predictable. But the content of the innovations is not predictable, or one would already have to have made the innovation. To the extent that human behavior is governed by ideas in which innovation occurs, the reliability of predictions plummets.

Recall that before the first presidential debate, it appeared certain that Obama would win. Now, not so much. Tomorrow morning, who knows? Yet the underlying weights that Silver’s model assigns to different pollsters and to his list of other “factors” remained the same. All that changed were people’s perceptions and their interpretations. We can insist that such things are irrelevant or freakish, but until someone identifies the underlying source of regularities that would override such things, I would say that the insistence is unscientific.

What we’re really talking about is philosophy of the social sciences, not statistics per se.

@Jeffrey Friedman: I agree we aren’t talking statistics, but instead, philosophy. And the counterfactuals are a little difficult to posit precisely, but when we say the probability is 65 percent today, what we mean (I posit) is that if nothing changes, that would be the odds in November. Now we can be supremely confident that things will change between now and then, but that doesn’t (again, IMO, philosophically) affect the meaning of the statement we’re making now. And in fact, you can bet on next year’s outcomes at a sports book, although of course the odds will adjust over time as more information comes in. Probabilities in this case, are an information filtration — as the information changes, so do the probabilties, but that doesn’t make the previous probabilities any less probabilities. They are the collapsing of a current information set to a single number which is the ggregate of the current information in a relevant 0-1 dimension.

Silver is calculating sentiment, including propensity to vote and the mechanical rules of alloting the electoral college. As sentiment changes, the probabilities. Philosophically, i don’t see that affect the concept of a probability of an electoral college win at all.

Jeffrey,

I’m sympathetic to your argument (as you know), but I wonder if you might get a bit more traction by drawing the distinction Keynes drew between 1) the probability we assign to a conclusion, which may or may not be a numerical probability; and 2) the degree of belief we have in the premises on the basis of which we’ve reached this conclusion. If the premises are weak, then we don’t attach much “weight” to the conclusion.

The factors you mention, which G.L.S. Shackle also stressed, suggest the “weight” we should attach to precise election probabilities is not very great.

[…] A nice complementary discussion to this is Andrew Gelman commenting on the degree of certainty in Nate Silver’s work trying to predict th…. […]

I don’t personally see the problem with reporting such a precise-looking estimate if the level of uncertainty is described in an obvious way. Where it becomes more problematic is when it is a point estimate reported with nothing else, in which case we interpret 65.7% to be 65.7% +/- 0.05%, which is clearly wrong. In any case, I’d rather that uncertainty be discussed explicitly than to rely on convention such as significant figures to inform the discussion.

I understand and perhaps even agree with your cynicism about Silver’s motivation in reporting probabilities to the tenth of a percentage point (i.e., generating news and getting eyeballs on the webpage). I wonder, however, if you’d be okay with people attributing similar motivation to the way in which academics conduct and report their research (e.g., that they’re motivated to report findings that get them noticed and generate grants) or attributing motivation to the way in which the New York Times covers the news (e.g., that they’re motivated to report in ways that advance their political preferences). Once we start playing the motivation game, it’s Katy bar the door.

Rob:

I was not intending to be cynical. I think that generating news and getting eyeballs on the webpage are reasonable goals, especially if you’re writing for a newspaper. I don’t see it as cynical to think that somebody might be doing his job! As for my academic research, Columbia University probably doesn’t mind if I get more press and the university’s name gets in the news; that sort of thing might get them more applicants. And the university certainly doesn’t mind that I get research grants.

This morning I discovered that if you mouse-over Nate’s prediction charts it displays the estimate and standard error for each day, along with a shaded band that appears to be +/- SE.

Checking this morning’s estimate (Obama +38 ECV, SE=66) with a normal distribution, I come up with a slightly higher probability of Obama winning than the estimate Nate displays (~71% vs ~66%). This is consistent, I think, with the actual simulated distribution being non-normal.

Of course you can give a probability to a non-repeatable event. There’s no such thing as a repeatable event. End of discussion.

It’s also helpful if numbers given have more precision. Sure, 65.7% and 66% are similar conclusions practically, but they’re not numerically equal.

I don’t see what all the fuss is about. The precision he gives is valuable to the broadest group of people possible, from Intraders to people who look at the tens digits and go, “Okay.” Furthermore, You can can hover your mouse over the plots on the fivethirtyeight blog and see the width of confidence intervals visually and numerically.

Brash:

No big fuss. It’s just that, as a statistician, I’m sensitive to this sort of thing. One thing we teach our students in all our classes is to realize that numbers come with uncertainty and not to report too many significant figures.

I think I was misunderstanding you. After thinking about it for a while, I agree with you. I’m not sure if the nearest ten percent is right, or the nearest whole number. Perhaps a compromise would be for Nate Silver to round to the nearest whole number. That way there would be more news…but…he wouldn’t be pushing it quite as hard when it comes to the precision.

If the model predicted, say, a confidence interval for the popular vote, would it be meaningful to ascribe a probability based on the fraction of the confidence interval that’s above the 50% mark?

The discussion is more about meaningful representations for probabilities, but it is still curiouis that nobody has mentioned the concepts of “subjective probability” or “bayesian probability”. So I’ll do it! :)

Jeffrey F., you first referred to a frequential interpretation of probability, later to principal unpredictability of human behavior.

Obviously human (individual or aggregate) behavior is not unpredictable in the sense that there would be no information at all. You can get shot outside of your house, but that is kind of unlikely (in most places). If there is information, there is probability, for probability is not only frequencies, but also a way to express subjective knowledge. I do not see a principal reason why there would not be at least vague states of knowledge about human (political) behavior.

A way to tie the subjective probabilities to reality is through betting of a “rational” agent not losing money on purpose. With subjective probabilities, validation again requires repeated events, but they are not repeats of an experiment (“ball draws”) but repeated betting of an agent that then becomes validated by not loosing all its/his/her money. (In practice this is much easier if the agent is a model that we know inside out and that is allowed to “bet” on a data set in our computer.)

It is important to undertand that different models or agents can give different probabilities and still be internally consistent, and the predictions of both can be well calibrated in the long run. Someone could always predict 50% for the democrats, and that could indeed be a valid forecast, just not very informative.

Another issue is the believability of markets specifically as sources of subjective probabilities. It is mostly an empirical question, and for sure there is a lot of work on how to turn bets, spreads, etc. into calibrated probabilities. But on markets there is volatility due to random fluctuation in the balance of buying and selling, and spreads, and I guess even those alone turn third decimal place meaningless.

The discussion is more about meaningful representations for probabilities, but it is still curiouis that nobody has mentioned the concepts of “subjective probability” or “bayesian probability”. So I’ll do it! :)

Jeffrey F., you first referred to a frequential interpretation of probability, later to principal unpredictability of human behavior.

Obviously human (individual or aggregate) behavior is not unpredictable in the sense that there would be no information at all. You can get shot outside of your house, but that is kind of unlikely (in most places). If there is information, there is probability, for probability is not only frequencies, but also a way to express subjective knowledge. I do not see a principal reason why there would not be at least vague states of knowledge about human (political) behavior.

A way to tie the subjective probabilities to reality is through betting of a “rational” agent not losing money on purpose. With subjective probabilities, validation again requires repeated events, but they are not repeats of an experiment (“ball draws”) but repeated betting of an agent that then becomes validated by not loosing all its/his/her money. (In practice this is much easier if the agent is a model that we know inside out and that is allowed to “bet” on a data set in our computer.)

It is important to undertand that different models or agents can give different probabilities and still be internally consistent, and the predictions of both can be well calibrated in the long run. Someone could always predict 50% for the democrats, and that could indeed be a valid forecast, just not very informative.

Another issue is the believability of markets specifically as sources of subjective probabilities. It is mostly an empirical question, and for sure there is a lot of work on how to turn bets, spreads, etc. into calibrated probabilities. But on markets there is volatility due to random fluctuation in the balance of buying and selling, and spreads, and I guess even those alone turn third decimal place meaningless.

1. The probability of a unique (non-repeating) event. Perhaps I’m missing something, but isn’t this well covered by E.T. Jaynes and his theory of probability as extended logic? See his book.

2. I agree with Andrew that giving the electoral college win probability of 65.7% does seem to convey an unwarranted precision. On the other hand, it depends on what you want to do with the number. If you are trying to duplicate Nate’s calculation, then the extra precision is helpful because if you match it, then you gain confidence that you understand what he’s doing. If the idea is to convey a prediction, then I think an odds ratio might be better. Nate could say we have two-to-one odds of Obama winning the electoral college vote. The odds ratio is the cost of a fair bet, and anyone who plays the horses has a real feeling for odds.

3. On Nate’s work. I think he uses an excessive amount of machinery to get his numbers. Several times now, I have gotten very close to his “now cast” for an electoral college win using the method of generating functions and estimates of the state-by-state win probabilities from the RCP electoral map. I used 98% for “strong,” 70% for “likely,” 55% for “leans” and 50% for “toss up.” Where he gets 67% I get 70% and so forth. I can’t help think that I must have been lucky, so I tried drilling into his state probabilities. I used my own little multinomial model which took me almost no time to write in Mathematica. Using the latest polls from Pollster.com, I get 70% for Romney for Florida, and he gets 75.3%. For Virginia I get 62%, he gets 58%. And so forth. My generating function code, which is one pretty simple line of code in Mathematica avoids all that simulation, and those regressions he uses. Again I might just be lucky. My stuff is simple and crude, plus I’m not expert at this political polling game.

4. Finally I have reservations about the whole game of trying to predict an elections using polls when things are close. These polls are all over the map in some cases– way beyond sampling variance. There are so many sources of error, I’m skeptical we can get a reliable prediction for the current presidential election. In other words, the problem isn’t the precision of the reported number, it’s number itself.

Yes it is covered well in Jaynes book, but many here haven’t seen it and many of the Frequentists seem to be completely ignorant of it. So I’ll take the opportunity to say it again to stir up more Frequentist/Bayesian trouble: “There are many many states of the universe compatible with the information we have about the election. 65% of those states evolve into an Obama win”.

Counting up the numbers of states compatible with given information is usually difficult. But it can be done sometimes. Einstein for example got the thermodynamics of polymers by counting up the number of states (in his primitive model) compatible with a given value of energy.

Typically though, the states are counted directly and we only deal with percentages (ratios). This is important because we can use symmetry arguments to get the values of ratios when it would be otherwise hopeless to count states directly. When we say the probability of a coin flip is .5 we are using such a symmetry argument (see Jaynes book for the glorious details). Most basic probability models are based on such symmetry arguments. This isn’t just true for discrete distributions. Maxwell for example derived the normal distribution (Maxwell’s distribution in kinetic theory) from a symmetry argument.

Such an interpretation of probability is objective, but there is also a subjective element to it. Given different states of knowledge K, you get different sets of possible states compatible with it S. On the other hand, whether the true state is in a given S is an objective fact (a given state of knowledge K can be objectively wrong in other words). Sometimes the probabilities can be related to frequencies and sometimes this is useful, but it’s never a requirement. As others have said, there no such thing a true repeated trial: every probability is the probability of a unique event.

That should be enough to get the Frequentists to weigh in.

As a horse bettor and also

pedant, I feel compelled to point out that Obama is 1 to 2, not 2 to 1.The underlying issue was, is, and always be, the accuracy of the stratification. The numbers reported aren’t uncertain, they are absolutely real (barring corruption or data entry error). What is uncertain is the accuracy of the sample which generated the numbers. While some might find this to be semantic nitpicking, it’s not. The uncertainty is (likely) not measurement error, but identification error. The 1936 “Literary Digest” mess is the archetype. Gallup is getting heat for the stratification it’s been using. 1,000 (or thereabouts) data points for a 50 state election is just silly. The BLS’s employment/unemployment surveys have tens of thousands of data points. Read the whole report for a week or month, and you’ll find that the CIs are huge. Few in the media (well, none that I know of) bother to print anything but the headline number. I checked some (not all) of the pollster.com polls, and only the ABC/Post document had any information about sampling structure. “Results have a margin of sampling error of 3 points, including design effect.” Sample size of 1,376. That’s not much, but it’s more than the published difference. And not, to my mind, even closer to true.

Oh, and relating polling to odds in Vegas (or wherever); not. Bookies adjust odds on the basis of money flows, not physical facts at issue; horses, boxers, or politicians. The point is to guarantee the bookies’ profits irregardless of the outcome. A horse of a different color.

Large exit polls are a whole different matter. I found the discrepancies in 2000 and 2004 persuasive that both elections were fiddled.

Of course the betting pools use a different method (and a different data set) but it’w just as much a probabilistic prediction as the psephological method is. Many would argue even better…. You say to guarantee profit… I say eliminate arbitrage, which is an expected value maximizer.

No, Vegas odds are not a probability of the *outcome*. The outcome doesn’t matter. Bookies manipulate the odds on horses X, Y, Z such that monies bet on each match the distribution they know is most profitable to them. Who wins the race is irrelevant. They don’t care; they just watch where the money flows (hmmm, may be a bit like PACs deciding which candidates to pay for) and change the odds if the money flows diverge from what they want. If the favorite is so obvious, the bookies stop taking bets, since they can’t move the money flows.

Robert: you are right about what bookies do, wrong about whether or not they create a probability. If equal amounts of money are bet on both sides, bookies are guaranteed a profit, so that’s what they do. But it’s a market just like any other market. The price of IBM stock is a prediction of the present discounted cash flows for IBM, but it’s created by an intersection of supply and demand by people with very diofferent views on that number. There are numerous articles showing just how well bookmakers odds predict horse races. That’s a (multinomial) probability. There is of course the famous longshot bias, in which favorites are underbet to some extent (though not by anenough to overcome the parimutual spread) but these markets are prediction markets (of probabilities) and there is no evicdence that they are any less accurate than “information-based” methods. (I use the scare quotes because these are information-based — it’s just not the bookies’ information.)

Suppose 100 fair coins. I win if more than 50 coins come up heads, loose otherwise.

How much do I have to tweak one coin to ensure I win with probability .678 say? Alternatively, how much do I have to tweak all coins to generate the same .678 probability of winning?

The first method is like predicting a change in one state so large it ought to be obvious. In this case the precision might be warranted. The second is like detecting a minute change across all states, and the precision might not be warranted.

In practice both effects are taking place. A slam dunk for partial pooling?

Andrew wrote: “Similarly, I might describe someone as being 5 feet 8 inches tall, or even 5 feet 8 1/2 inches tall, but it would be silly to call him 5 feet 8.34 inches tall, given that his height changes by a large fraction of an inch during the day.”

I think it depends on the purpose of what I’m doing. In the description of a person made for ID purposes ‘5 feet 8 inches’ will make sense, but ‘5 feet 8.34’ won’t. If, however, I’m trying to justify statements like ‘his height changes by a large fraction of an inch during the day’, or study just how the pattern of height changes ‘intraday’ is, it’s different.

Nate’s numbers are, at least in part, a study of how his model works ‘intra-race’. In this context the precision isn’t lost but necessary.

Harald,

Sure . . . but I don’t think that a changes of 0.004 percentage point in the expected vote has any meaning at all. We’re not talking about 4 tenths of a percentage point, which is basically impossible to measure (given that polls have margins of error of +/- 3 percentage points) but could be consequential. We’re not even talking about 4 hundredths of a percentage point. We’re talking 4 one-thousands of a percentage point. That’s 0.00004 of the vote. At this point, you might as well say someone is 5 feet 8.3437 inches. At some point you’re working with pure noise. If you’re setting the price in a liquid market, I can see it. As a probability forecast, though, no way.

I agree that 65.7% is overprecise, but I disagree that Nate might as well round to the nearest 10%. Suppose he runs his model one day, and gets a 73% chance of an Obama victory. He reports this as 70%. A few days later some new poll results come in, and they are all slightly less favorable to Obama. Now his model gives him a 67% chance of an Obama victory. If he reports this as 70%, it will look like nothing has changed. But in fact, the polls are less favorable to Obama than they were before. We may not know that 73% was right, or that 67% is right now, but we know that the probability now is lower than the probability a few days ago. It makes sense that this is reflected in the numbers.

I guess all I’m really saying is that rounding to the nearest 10% is too coarse; I think the predicted probability is more accurate than that. Maybe rounding to the nearest 5% or the nearest 2% would be about right.

I wouldn’t squawk if he reported to the nearest percentage point, even though it’s a bit overprecise. Nearest 0.1%, I agree, that doesn’t really mean anything.

[…] Are electon probabilities meaningful? – Andrew Gelman […]

Based on what Nate Silver said in his Daily Show appearance I surmise he’d be one of the first to agree that it’s completely silly to expect daily commentary on decimal-point shifts in model projections to be meaningful…. yet it seems to be that his deal with NYT requires producing such commentary.

I’m a little confused about how you’d even quantify any uncertainty for the “chance of winning” estimate; how do I distinguish “90% chance of winning with very high uncertainty” from “60% chance of winning with very low uncertainty?” The outcome is the same in either case. There’s no standard deviation bars on the chance-of-winning graph. There is a standard deviation when you mouse over the electoral-college-share, but the chance-of-winning is just a summary of the electoral college simulations.

Peter,

The win probability itself incorporates the uncertainty of the prediction. The greater the uncertainty, the closer to 1/2. You can’t get more uncertain than 1/2. Contrast this with estimating the probability of “heads” in coin tossing from past outcomes, call it

p. Herepis actually a parameter in a model, so you will see statisticians provide a confidence interval (frequentists), or a credible interval (Bayesians) forp. Now when the Bayesians do a prediction for future outcomes, they give a probability with no credible interval. This probability now incorporates the uncertainty in estimating the value of the parameter for a non-informative prior. Thus we have multiple ways of expressing the uncertainty regardingp. Give the whole posterior distribution ofp, a credible interval, or make a prediction for the probability of heads in the next toss based on the posterior.Anyway this is how I understand the probability of a win, others might differ. A frequentist would give you a win probability and a confidence interval for it. A Bayesian would give you the win probability, and his prior. Naturally with very little data, the prior will dominate the win probability, which is why we like non-informative priors.

The idea that we shouldn’t use precise numbers has always annoyed me. Why not? I can assure you that the reason Nate says it’s 65.7% is because that’s the number Nate’s computer spat out.

Why should 70% or 68% be anymore accurate than 65.7%? In fact, it would be less accurate because you’ve fudged the results by rounding, and overestimated Obama’s probability of winning.

The only valid argument you have is that 65.7% makes it look like Nate is more sure of himself than he should be. But he does no such thing, that’s you’re reading of it. If the reader understands there’s uncertainty, then there’s no reason left not publish 65.7%, because 65.7% IS a better estimate than 80%.

Kevin:

The reason for controlling the display of digits is that the viewer’s attention is finite. More digits on one number means less ability to see other numbers on the page. To put it another way, why not say the probability is 65.74326382364829790019279981238980080123%? The answer is that every number after the “65” is essentially meaningless. Just as it would be meaningless to say that someone’s height is 5 feet 8.34376382364829790019279981238980080123 inches.

i think you need more precision. it’s not a good enough estimate.

To (different) Jonathan: My corner of this discussion is simply to ask what is gained by saying that, as of yesterday, Obama had a 67% chance of winning, a number derived from weights attached to this year’s polls that are based on the accuracy of the models the pollsters used *four years ago,* along with a bunch of other factors that, four years ago, proved to be useful heuristics for [some unknown electorate-wide mechanism].

Much better, it seems to me, to say: “Given the following understanding of *this year’s* dynamics [e.g., GOP voters seem more enthusiastic than Democratic voters],* I discount Ohio polls X and Y but favor poll Z. In other words, the old-fashioned journalistic analysis of polling that, for all its faults, does not claim to be scientific in any teched-up sense.

What is gained by attaching a probability to such predictions–when this number has to be derived from the dynamics (i.e., voter interpretations of a very different situation) in 2008? I’m not disputing the logic of Bayesianism, I’m asking what its value added is in cases such as this.

I think Gelman’s example, on MonkeyCage, of business forecasting is much closer to what we’re dealing with in politics than the example of football games, because business forecasting entails the interpretation of what various statistics mean about economic actors’ interpretations of their situations. Because those interpretations are determined by pathways of ideas that are inaccessible to the business forecaster, business forecasts are well known to be quite unreliable.

A lesson of the financial crisis is (arguably) that when forecasters attach probabilities to their forecasts, all they gain is the spurious confidence supplied by assuming that the past is continuous with the future in the manner specified by their model. If we take unknown unknowns seriously, the problem is that we don’t know *which parts* of the past are continuous with the future–that is, which behaviors under given circumstances–because, I maintain, people’s interpretations of their circumstances are too unpredictable for that.

I wonder what people here think of Taleb, although I assume he has been much discussed before.

Nate’s model is effectively a codification of “journalistic” arguments for which there is available data. Much of his commentary also revolves around the things that aren’t captured or are limitations of his model. It’s fine to question the time-invariance of his training data and the calibration of the estimation intervals, but I don’t see how throwing around gut theories about “this year’s dynamics” for which there is no available data will make for more useful commentary.

Regardless of what anyone thinks of Taleb, invoking black swans when the outcome is binary probably means that you should re-read how Taleb defines his idea of the 4th quadrant.

I’m a huge fan of Taleb. I don’t see where Jeffrey was “invoking black swans” whatsoever. I think he was probably thinking more along the lines of model error, which is the focus of Taleb’s new (and I suspect much more important) work.

Taleb didn’t invent the idea that models can be wrong. By his own account, the model error that he rails about is in the domain where low probability events have large effects. I’m pretty sure he could care less about an error where Nate’s incorrectly-predicted-election tally increases by +1. Even within the model (or within any election prediction model for that matter), the probability that the unfavored candidate will win is fairly high anyway.

Personally, I’m looking forward to his new book as well, but I’m kind of tired of people invoking him when they don’t seem to understand what he’s talking about.

I like Taleb because he, like me, will be speaking at the Larchmont public library.

Well, no, actually he rails against any over-certainty in models. For instance, even in cases where the normal distribution seems safe, uncertainty about the variance (Taleb talks about the “variance of the variance of the variance”) can dramatically change the probability of events that aren’t even too far in the tails. Yes, he’s most known for problems in the “4th quadrant”, but that’s not so much the focus of his new book (he’s been sharing drafts of chapters on his FB page).

I’m familiar with his example of the normal distribution. This isn’t even a normal distribution – it’s a binary outcome, which is bounded by the problem definition in a way in which things like financial modeling predictions are not. The bounded outcomes here reflects the problem statement, not the modeling assumptions.

Anyway, how exactly is the prediction overly certain? Using a model that averages over polls here has the effect of increasing the uncertainty in the outcome here rather than decreasing it. Romney winning is about as surprising as rolling a 1 or a 2 on a 6-sided die, Obama winning is about as surprising as rolling a 3-6. That seems pretty uncertain to me. That and given how much the limitations of the model are discussed on the blog, it seems like you’re projecting a claim to over-certainty when Nate never made any such claim of certainty to begin with.

The folks at OrgTheory think there’s a divergence in electoral vs popular vote probabilities because of the south.

The Depression bottomed out in 1932. There was actually a pretty strong recovery that started once FDR got off gold and devalued the dollar. There were problems later including the “recession within a depression”, but if voters continued to judge based on how the economy had changed since his election, that would explain FDR being re-elected.

If attaching probabilities to forecasts just gives spurious confidence, you should go into business with a strategy of not doing that and eat the lunch of those other suckers. I don’t expect you’ll do very well, not that you’ll care what probability I assign to that statement! Under Nassim Taleb’s theories, selling insurance should be the most sure fire way to fail, yet plenty of insurance companies have been as profitable and long lived as the average in any other industry.

At Worthwhile Canadian Initiative they ranked presidents according to the economic performance under their administration. Hoover is dead last, FDR is in the top ten.

The precision point is a very old one (Aristotle?) and not extremely interesting. What stood out to me more with the 538 model was that only one State was in the 40-60% band in early July, and there are still only two in that band. Especially when we were back in July with four months to go, such calibration looked pretty odd and essentially over-confident. I mean I’m sure that the calibration and parameters work with the historical data, but the July output just didn’t quite seem reasonable to me.

With Intrade, because of capital constraints and the margining system (it takes $6 to buy at 60% but only $4 to short there) I think prices are biased towards 50%, beyond the usual fav/longshot bias near extremes. On the other hand, I feel like the 538 model is calibrated such that probabilities are pushed away from 50% towards extremes more than they ought to be. I might be persuaded otherwise though.

That 65.7% is a point estimate, right? I think it would be useful to see a highest probability density region (assuming the estimates are Bayesian) or a confidence interval (if not) around that number.

I suspect that probabilities are not the most quantitatively meaningful measures to work with–to use some of Nate’s numbers, there’s a big difference between the 99.5% probability that Indiana will vote Romney and the 100% probability that Texas will vote Romney. (I suspect that Nate’s model really says that the probability for Texas is something like 99.999%.)

Yes, it’s a bit much to report the probabilities to three places. But if the model metric is, say, ln-odds, then there really IS a difference between a 65.5% ( = .64 in ln odds) and a 66.0% ( = .66 ln odds). Rather than attempt to communicate to the readers of the paper of record what log-odds are, maybe Nate reports probabilities.

The practical significance of reporting a probability to such a high precision might be meaningless in most instances, I agree. Even if another measure on the same object of interest were sensitive relative to another, and those two measures are reasonably well related the interpretative value of base percentage points are not high unless it is actually a critical component in driving the process that generates the outcome we are trying to predict. I guess this is why the debate has polarised between supporting the idea that the outcome is sensitive to changes in the current state and discounting the value of percentage points. Either way if it was the aim of Nate’s analysis to build a model that gives probabilities to the nearest nth decimal point then it is probably warranted to call it silly. However I am pretty sure that calculating the probabilities to that degree of accuracy was only consistent with the analysis and trying to get a feel for the variation surrounding that number, rather than the aim of the analysis itself.

It (the probability) is simply an exogenous (to one’s belief) measure of what the more likely event is – generated by what is known(the data) and and what the possible routes forward are. Whether you are for or against reporting at such high precision – that doesn’t change the message that the analysis is trying to get across, given the data.

[…] Having a steady supply of polls of varying quality from various sources allows poll aggregators to produce news every day (in the sense of pushing their estimates around) but it doesn’t help much with a forecast of the actual election outcome. (See my P.S. here.) […]

In scientific papers p-values are usually reported up to three decimal numbers

For example, if I had 50 and 30 successes in group A and B, and 100 not-successes in both, p=0.0581.

If I had one more success in A, I would have p=0.0444.

If I had one more success in B, I would have p=0.0808.

Here moving one unit, I could almost double the result. Nevertheless, p-values continue to be reported up to three decimal numbers.

[…] that only will happen once? I actually don’t like getting into this philosophical discussion (Gelman has some thoughts worth reading) and I cut people who write for the masses some slack. If the […]

[…] “Is it meaningful to talk about a probability of “65.7%” that Obama will win the election?” https://andrewgelman.com/is-it-meaningful-to-talk-about-a-probability-of-65-7-that-obama-will-win-the… […]