Skip to content
 

Leicester City and Donald Trump: How to think about predictions and longshot victories?

Leicester City was a 5000-to-1 shot to win the championship—and they did it.

Donald Trump wasn’t supposed to win the Republican nomination—last summer Nate gave him a 2% chance—and it looks like he will win.

For that matter, Nate only gave Bernie Sanders a 7% chance, and he came pretty close.

Soccer

There’s been a lot of discussion in the sports and political media about what happened here. Lots of sports stories just treated the Leicester win as a happy miracle, but more thoughtful reports questioned the naive interpretation of the 5000:1 odds as a probability statement. See, for example, this from Paul Campos:

Now hindsight is 20/20, but when a 5000 to 1 shot comes in that strongly suggests those odds were, ex ante, completely out of wack. . . . Leicester was the 14th-best team in the league last year in terms of points (and they were better than that in terms of goal differential, which is probably a better indicator of underlying quality). Anyway, the idea that it’s a 5000 to 1 shot for the 14th best team in one year to win the league in the next is obviously absurd on its face.

The 14th best team in the EPL is roughly equivalent to the 20th best team in MLB or the NBA or the NFL, in terms of distance from the top. Now obviously a whole bunch of things have to break right for for a 75-87 team to have the best record in baseball the next year. It’s quite unlikely — but quite unlikely as in 50-1 or maybe even 100-1.

That sounds about right to me. Odds or no odds, the Leicester story is inspiring.

Primary elections

There are a bunch of quotes out there from various pundits last year saying that Trump had zero chance of winning to the Republican primary. To be fair to Nate, he said 2% not 0%, and there’s a big difference between those two numbers.

But even when Nate was saying 2%, the betting markets were giving him, what, 10%? Something like that? In retrospect, 10% odds last fall seems reasonable enough: things did break Trump’s way. If Trump was given a 10% chance and he won, that doesn’t represent the failure of prediction markets.

I’d also say that, whatever Trump’s unique characteristics as a political candidate, his road to the nomination does not seem so different from that of other candidates. Yes, he’s running as an anti-Washington outsider who’s willing to say what other candidates won’t say, but that’s not such an unusual strategy.

My own forecasts

I’ve avoided making forecasts during this primary election campaign? Why? In some ways, my incentives are the opposite of political pundits. Nate’s supposed to come up with a forecast—that’s his job—and he’s also expected to come up with some value added, above and beyond the betting markets. If Nate’s just following the betting markets, who needs Nate? Indeed, one might think that the bettors are listening to Nate’s recommendations when deciding how to bet. So Nate’s gotta make predictions, and he gets some credit for making distinctive predictions and being one step ahead of the crowd.

In contrast, if I make accurate predictions, ok, fine, I’m supposed to be able to do that. But if I make a prediction that’s way off, it’s bad for my reputation. The plus of making a good forecast are outweighed by the minuses of screwing up.

Also, I haven’t been following the polls, the delegate race, the fundraising race, etc., very carefully. I don’t have extra information, and if I tried to beat the experts I’d probably just be guessing, doing little more than fooling myself (and some number of gullible blog readers).

Here’s a story for you. A few months ago I was cruising by the Dilbert blog and I came across some really over-the-top posts where Scott Adams was positively worshipping Donald Trump, calling him a “master persuader,” giving Trump the full Charlie Sheen treatment. Kinda creepy, really, and I was all set to write a post mocking Adams for being so sure that Trump knew what he was doing, it just showed how clueless Adams was, everybody knew Trump didn’t have a serious chance . . .

And then I remembered why primary elections are hard to predict. “Why Are Primaries Hard to Predict?”—that’s the title of my 2011 online NYT article. I guess I should’ve reposted it earlier this year. But now, after a season of Trump and Sanders, I guess the “primaries are hard to predict” lesson will be remembered for awhile.

Anyway, yeh, primaries are hard to predict. So, sure, I didn’t think Trump had much of a chance—but what did I know? If primaries are hard to predict in general, they’re hard for me to predict, too.

Basically, I applied a bit of auto-peer-review to my own hypothetical blog post on Adams and Trump, and I rejected it! I didn’t run the post: I rightly did not criticize Adams for making what was, in retrospect, a perfectly fine prediction (even if I don’t buy Adams’s characterization of Trump as a “master persuader”).

The only thing I did say in any public capacity about the primary election was when an interviewer asked me if I thought Sanders cound stand up against Donald Trump if they were to run against each other in a general election, and I replied:

I think the chance of a Sanders-Trump matchup is so low that we don’t have to think too hard about this one!

It almost happened! But it didn’t. Here I was taking advantage of the fact that the probability of two unlikely events is typically much smaller than the probability of either of them alone. OK, the two parties’ nominations are not statistically independent—it could be that Trump’s success fueled that of Sanders, and vice-versa—but, still, it’s much safer to assign a low probability to “A and B” than to A or B individually.

But, yeah, primaries are hard to predict. And Leicester was no 5000:1 shot, even prospectively.

P.S. Some interesting discussion in comments, including this exchange:

Anon:

A big difference between EPL and Tyson-Douglas is that there were only two potential winners of the boxing match. The bookies aren’t giving odds on SOME team with a (equivalent to) 75-87 record or worse winning the most games next year – but for one specific team. 5000-1 may be low, but 50-1 is absurdly high given there actually are quality differences between teams, and there are quite a few of them.

My response:

Yes, I agree. Douglas’s win was surprising because he was assumed to be completely outclassed by Tyson, and then this stunning thing happened. Leicester is a pro soccer team and nobody thought they were outclassed by the other teams—on “any given Sunday” anyone can win—but it was thought they were doomed by the law of large numbers. One way to think about Leicester’s odds in this context would be to say that, if they really are the 14th-best team, then maybe there are about 10 teams with roughly similar odds as theirs, and one could use historical data to get a sense of what’s the probability of any of the bottom 10 teams winning the championship. If the combined probability for this “field” is, say, 1/20, then that would suggest something like a 1/200 chance for Leicester. Again, just a quick calculation. Here I’m using the principles explained in our 1998 paper, “Estimating the probability of events that have never occurred.”

Also a comment from fraac! More on that later.

P.P.S. There still seems to be some confusion so let me also pass along this which I posted in comments:

N=1 does give us some information but not much. Beyond that I think it makes sense to look at “precursor data”: near misses and the like. For example if there’s not much data on the frequency of longshots winning the championship, we could get data on longshots placing in the top three, or the top five. There’s a continuity to the task of winning the championship, so it should be possible to extrapolate this probability from the probabilities of precursor events. Again, this is discussed in our 1998 paper.

The key to solving this problem—as with many other statistics problems—is to step back and look at more data.

Just by analogy: what’s the probability that a pro golfer sinks a 10-foot putt. We have some data (see this presentation and scroll thru to the slides on golf putting; see also in my Teaching Statistics book with Nolan, the data come from Don Berry’s textbook from 1995 and there’s more on the model in this article from 2002) which shows a success rate of 67/200, ok, that’s a probability of 33.5% which is a reasonable estimate. But we can do better by looking at data from 7-foot putts, 8-foot putts, 9-foot putts, and so on. The sparser the data the more it will make sense to model. This idea is commonplace in statistics but it often seems to be forgotten when discussing rare events. It’s easy enough to break down a soccer championship into its component pieces, and so there should be no fundamental difficulty in assigning prospective probabilities. In short, you can get leverage by thinking of this championship as part of a larger set of possibilities rather than as a one-of-a-kind event.

94 Comments

  1. John Crosswhite says:

    Nate got 2% by making up six coinflips that could knock Trump out at various stages. It’s a bogus figure.

  2. fraac says:

    That’s totally clueless about soccer. First, goal difference is not a better measure of quality. Where did that come from? Leicester were the rank outsiders, not the 14th best, regardless of where they finished the previous season – bookies aren’t stupid. They had a squad that should have finished between 15th and 20th (last), yet they won the league because Chelsea and Man City gave up early and replaced their managers with guys under contract to other clubs until next season, they avoided injuries, and they played a simple style that suited their limited skill. Realistically it would have been about 1000-1.

    • Andrew says:

      Fraac:

      If what you say is correct, that Leicester City “should have finished between 15th and 20th, yet they won the league,” this suggests that “winning the league” has a bit of a random component, hence the absurdity of 5000-1 odds even prospectively. Buster Douglas should’ve been destroyed by Mike Tyson, I assume, but Douglas was a 42-1 shot, not 5000-1.

      The fact that Leicester won because of an unusual string of events plus good strategy: sure, that’s how an underdog can win the championship. Unusual strings of events happen, and some teams have good strategy. Again, the possibility of this happening should be part of the odds.

      Anyway, I’m no expert on soccer. I found Campos’s argument convincing, also, yes, goal difference should provide information beyond what you get from wins and losses. I say this based on my understanding of other sports, where point differential is meaningful. You score goals one at a time, and better teams score more goals and have fewer goals scored on them.

      • anon says:

        A big difference between EPL and Tyson-Douglas is that there were only two potential winners of the boxing match. The bookies aren’t giving odds on SOME team with a (equivalent to) 75-87 record or worse winning the most games next year – but for one specific team. 5000-1 may be low, but 50-1 is absurdly high given there actually are quality differences between teams, and there are quite a few of them.

        • Andrew says:

          Anon:

          Yes, I agree. Douglas’s win was surprising because he was assumed to be completely outclassed by Tyson, and then this stunning thing happened. Leicester is a pro soccer team and nobody thought they were outclassed by the other teams—on “any given Sunday” anyone can win—but it was thought they were doomed by the law of large numbers. One way to think about Leicester’s odds in this context would be to say that, if they really are the 14th-best team, then maybe there are about 10 teams with roughly similar odds as theirs, and one could use historical data to get a sense of what’s the probability of any of the bottom 10 teams winning the championship. If the combined probability for this “field” is, say, 1/20, then that would suggest something like a 1/200 chance for Leicester. Again, just a quick calculation. Here I’m using the principles explained in our 1998 paper, “Estimating the probability of events that have never occurred.”

      • fraac says:

        Okay but there is no information – goal difference, variance, whatever you want – that isn’t available to the bookmakers. I agree 5000-1 is an advertising price rather than a probability, it’s a made up number after they’ve done the work of considering teams who might actually win. If you go into a bookies and ask the odds of your unborn son winning Wimbledon, or Christmas day being the warmest day of the year, they’ll quote 5000-1. 1000-1 would have been realistic. It’s the most bizarre result I’ve ever seen in sport. I love boxing too, and in a one-off fight between heavyweights you won’t see odds longer than about 20-1 because there is a belief that the worst 200 lb professional boxer can knock out any human with a clean punch. Leicester are asking us to believe that any Prem team can sign a few unknown talents, avoid injuries, use a low-possession, counterattacking style (that was nothing revolutionary), and top the most stratified league in the world after 38 games. It’s preposterous.

    • KKnight says:

      The 5000-1 odds as well as the predictions by most experts that Leicester City would finish dead last are in no small part due to the fact that the appointment of Claudio Ranieri as manager was perceived as absurd at the time. (Not sure why as his track record was really quite good.) Smaller “provincial” teams have won the top division before – for example, Ipswich Town in 1962 and Nottingham Forest in 1978. Yes, things were different back then (much more parity in the top-flight) but probably odds of 100-1 or so would have been more realistic.

  3. Rahul says:

    To claim *ex post* that “those odds were, ex ante, completely out of wack” has very little credibility or content.

    Just admit yes, we got the predictions completely wrong.

    • Andrew says:

      Rahul:

      I disagree. This claim is of course ex post, hence any claims are inherently model-based. But we make ex-post judgments of ex-ante decisions all the time.

      For example, suppose you tell your friend that you missed the bus to work in the morning. Your friend says: You should get to the bus stop on time, fool! And you reply: No, I showed up on time, the damn bus left early. And your friend says: You fool, don’t you know that buses leave early sometime? Your bad for not taking this into account so that you wouldn’t get to work late. And you reply: No, I checked the stats on this bus line: it leaves early only 1% of the time, and I was willing to take the 1% risk of being late for work. And your friend says: But today was different, the usual driver was sick and was replaced by an eager driver who likes to leave early! And you say: I had no idea, so even if my decision was wrong retrospectively, it was correct prospectively. And your friend says: No, you fool, your prospective decision should take into account that bus drivers get sick sometimes. Etc.

      OK, the bus example is contrived, but the point is that it is possible to make arguments about the prospective wisdom of decisions in retrospect. You just have to be clear about what information you’re conditioning on.

      My other point is quantitative. It seems completely reasonable to me to say that Trump’s 10% odds were fair at the time, and that he did better than expected in order to win the nomination, that he won the equivalent of three successive coin flips. It does not seem so reasonable to think that Leicester were legitimate 5000-1 odds. Based on what I’ve seen, 100-1 seems more reasonable as prospective odds that this longshot would win the title.

      • Keith O'Rourke says:

        > is possible to make arguments about the prospective wisdom of decisions in retrospect. You just have to be clear about what information you’re conditioning on.

        Agree, but would clarify/add “not limited to the case at hand but to unlimited (reference) class of cases.”
        (Blocking the – you can’t argue with success – argument.)

      • Victor Ordu says:

        This is an ‘aside’ comment: I’ve not studied this quantitatively, but among all Republican candidates it’s only Trump that I recall seeing people urging him to run for President as far back as 2008. I kept seeing this on Twitter and elsewhere on social media. I’m not an American, but when he declared to run, I told my friends he would likely win the nomination. I wasn’t betting – the facts speak for themselves. A comparative sentiment analysis on social media over the past 10 years could yield some interesting results. Unbeknownst to many, the it’s likely the man started working on this way before the others and came better prepared.

        Ditto Leicester.

  4. Rahul says:

    I didn’t get the bit about Nate being “expected to come up with some value added, above and beyond the betting markets” or “credit for making distinctive predictions”.

    Are we saying that Nate intentionally takes up contrarian positions just to make a splash? Irrespective of what he really things is the most likely outcome.

    • Andrew says:

      Rahul:

      I don’t know how Nate makes his decisions. But I do think he’s under some pressure to come up with his own unique statements, not merely to post each day saying that he agrees with the betting markets.

      • Rahul says:

        I’d like to think Nate uses whatever he thinks will predict best. Sometimes (mostly?) those predictions will agree with the betting markets. Sometimes they won’t.

        He may very well be under the pressure you mention, but I doubt that causes him to make intentionally contrarian predictions.

  5. Paul Alper says:

    The WSJ is not an overly big fan of Trump so they compared him recently to another lager than life figure, Rodrigo Duterte, the newly elected President of the Philippines.

    “During the campaign Mr. Duterte promised to dump bodies into Manila Bay to ‘fatten all the fish there…forget the laws on human rights’called himself a ‘dictator’…admitted to receiving millions in ‘gifts’ from friends.”

    However, the strangest part of this editorial (The Philippines Elects its Trump) is the folowing where Duterte boasts that

    “he never gives public funds to his mistresses.”

    But to return to Leicester City: the bookies made the mistake of being greedy. No “punter” was interested in betting on Leicester City at the ordinary, customary outsider odds of let us say 100 to 1. So the bookies juiced things up to 5000 to 1 just to tempt people to put some spare skin in the game and add some easy profit to the bookies. There is a lesson somewhere in all of this unusual behavior/outcomes.

  6. Alex says:

    Interestingly, the 20th team listed on bovada’s ‘odds to win the 2016 World Series’ page right now is the Yankees. They are listed at +5000 (tied with the Phillies). Sort of a coincidence.

    According to a story from ESPN in February, the Yankees were 14 to 1 to win according to the Westgate in Vegas. That could be inflated since the Yankees are a pretty popular team. The Phillies, however, were 500 to 1 (tied for lowest odds). Apparently you learn a lot from ~30 games of baseball. Also for reference, that ESPN article has the 20th team as 40 to 1.

  7. Z says:

    I think what the extreme Leicester and Trump forecasts have in common is that they were generated by the following process:

    1) Something doesn’t happen for a long time
    2) Experts hypothesize some mechanistic explanations for why it can never happen
    3) These explanations are supported by some more instances where the thing doesn’t happen and people start to believe them
    4) The mechanistic explanations supplant past data as the basis for future predictions

    The mistake seems to be believing the mechanistic explanations (e.g. the party decides or vast disparities in payroll reliably produce vast disparities in quality)

  8. Chris J says:

    In the 2012 Republican Presidential Primary polls, between Aug. 25, 2011 and Feb. 8, 2012,

    Romney was only in the lead for short stints as Perry, Cain, Gingrich, and Santorum each led, sometimes by large margins. Romney led continually before and after those dates, so leads by the other candidates were just a flirtation, not a yearning. Using this as a partially defined model, the predictions for 2016 generally assumed that Trump support would fall as he “misspoke” or voters got to know him better just as these other candidates had fallen in 2012. An alternative (better) interpretation of 2012 would be that in 2016 a combination of (1) a candidate with just the right kind of outsider appeal and persuasive skills to win Republican primaries, (2) a weak enough establishment candidate (Mr. Low-Tea) and (3) an otherwise weak field, could result in the major upset that has occurred in Trump. Is that really a long shot or a failure of interpretation of how the system of Republican primaries has evolved in recent cycles? The fact that the “reasonable” candidates, i.e. the favorites of Democrats, like Huntsman and Kasich never get off the ground should tell us something. The better sports analogies might be the Golden State Warriors with Stephen Curry and the 3-point shot or the New England Patriots unorthodox team building strategies. That is, we need to understand what is happening to the model. Are there any random variables that matter much here or did Trump just go out and execute on a winning strategy that no one could quite see beforehand?

  9. Gotta say here, because no one else is… You Go Buster… take the 30M and get out quickly before you wind up with a reputation for “amazing” boxing, and serious brain damage. Fishing in Florida with an accountant who keeps you on an even keel and no real signs of traumatic brain injury should be an example for us all not to give in to what other people want from us just because we step into some role they want us to play.

  10. Marco says:

    It’s also the question if betting odds can be directly translated to the probability that the underlying event occurs.

    From the bookmakers perspective the odds were fixed and it was no real dutch book (https://en.wikipedia.org/wiki/Dutch_book), but the bookmakers did not go “all long”. Every bet on another team winning the EPL made them money. One has to factor this in and the capital distribution of this bets. So 5000:1 could be a sensible odd for a bookmaker to sell, but he is not (very) interested in the true probability – and does not need to be, thanks to hedging!

    If one could somehow determine that the probability of Tyson beating Douglas is 80%, the odds in a betting market could be very different – even if the vast majority of bettors know or believe this probability to be true. Its kind of a (lacking) ergodicity property in boxing: Even if one boxer is just a little better every round, if he manages to win 10 or 12 rounds (and does not get knocked out) the result will be very clear cut. The counter argument is, that if two boxers are nearly equal very loop-sided victories should occur rarely.

    If a majority of 90% of people would bet on Tyson winning his betting odds would change and be worse than 80%. In finance terms, the odds are set by bookmakers in relation to supply and demand. They can be interpreted as a probability measure, but the connection to the true, underlying measure is very indirect, if there is any. The odds would be the so-called “risk-neutral” probability measure, which most times differs from the true one.

    This is no real answer to extreme 5000:1 odds, but much more applicable to betting markets and 10% winning probability for Trump.

    As a side note, one can also think about whether the probability of Trump winning the nomination as a singular event is well-defined. As a subjective measure of certainty yes, but in a stricter sense? But I guess this is a recurring conversation on this blog.

    • Rahul says:

      Here’s a naive question: The 5000-to-1 odds are fixed-odds bets offered by the bookmakers, right?

      Was there a parallel pari mutuel betting on the event too? It’d be interesting to see what the odds people placed on this outcome there.

      Or is there no difference, in practice? i.e. Do the fixed-odds offered by bookies match the autogenously arising pari mutuel betting odds approximately?

      • Marco says:

        Yes, the 5000:1 odds are fixed-odds before the season starts. I don’t know about pari mutuel betting on the Premiere League, but for instance betfair (https://www.betfair.com/sport/football) lets you bet on either of win, draw or lose. I think the odds are being adjusted when new (big) bets are added.

        My main point was that those bookmakers odds do not necessarily represent a good or accurate estimate of the real probability of the event in question occurring! And they should not be interpreted like this.

  11. Anon says:

    Is there a principled way of distinguishing between ‘was out of wack’ and ‘was totally right but unlikely events do occasionally occur, you know’ based on 1 data point?

    • Andrew says:

      Anon:

      N=1 does give us some information but not much. Beyond that I think it makes sense to look at “precursor data”: near misses and the like. For example if there’s not much data on the frequency of longshots winning the championship, we could get data on longshots placing in the top three, or the top five. There’s a continuity to the task of winning the championship, so it should be possible to extrapolate this probability from the probabilities of precursor events. Again, this is discussed in our 1998 paper.

  12. Christian Hennig says:

    An issue here is that it’s not so clear what is meant by “the right odds”. At the end Leicester win the PL or not. Nowhere “out there” is a “true probability” for them to win before they do it.
    There are a number of ways how we can define such “true odds” and I’d guess that they could lead to a quite diverse set of numbers.

    First thing, good bookmakers will do better predicting how bets will be distributed rather than what will happen, so that they can win regardless of what happens. I haven’t heard anywhere that 5000-1 for Leicester was bad in this respect. British media found some oddballs who had indeed placed bets on Leicester at the beginning of the season but their number was too low that I’d think that if anything bets were distributed even more extremely against Leicester than 5000-1, which means that the bookies probably didn’t get predicting the betting distribution all too wrong.

    Second thing, I’d not expect these odds to align with subjective probabilities. If everyone was a Bayesian, they probably should, but chances are the majority of people just bet on what they believe (or on the team they support if they at least have a bit of hope), but they wouldn’t bet on an outsider just because the odds are 5000-1 and their personal probability is 0.1% which is larger than 1/5000.
    This means that if 5000-1 reflects the distribution of bets correctly, it will be biased (too extreme) as an estimator of, say, average subjective probability.

    Third thing, if we try to not be fully subjectivist on this, it is damn complicated and involves problematic decisions. One can ask, of course, how likely it was in the past that the team placed 14th wins the league. This could be taken as an estimator of a “true” frequentist probability. But what’s the reference class? The British premier League? All kinds of soccer leagues? All kinds of leagues of all kinds of team sports? Also, because the number of cases in the PL alone may not be that big, one would rather want to have a model that models not only the chances of the team on rank 14 but rather on all ranks with some monotonicity constraint (rank 14 may just look better or worse than it should be because of random variation which could be corrected by this).

    Next thing, what to do about additional knowledge? In the PL, there is a huge financial gap these days between a handful of top teams and the rest; actually in terms of budget and the ability to attract top players and managers, the situation is very skew, even the gap between the sixth and seventh most wealthy team will probably be much larger than between the ranks 14 and 15. This has become more and more extreme recently (which was reflected in the outcome of quite a number of seasons before the current one), it wasn’t like this in the past. So actually all historical PLs or sports leagues don’t make a particularly convincing reference class, but what to use instead?

    Furthermore, instead of taking Leicester as “the team that ended up 14th last season” one may actually want to use some information available about Leicester in the beginning of the season, such as their actual budget, player performance data from previous years, the record of their manager etc., and of course of their environment, meaning all other teams.

    Quite involved indeed.

    The Bayesians may sneer at the frequentists for losing their frequencies but actually they are in no better position unless they are happy to be boldly subjective, in which case we won’t have “true odds” either but only many personal ones. Which is what we started with.

    5000-1 is the wrong odds – for whom, on what grounds?

    (Personally I’d have thought 5000-1 is too extreme but 200-1 is not enough by the way… don’t ask why.)

    • Andrew says:

      Christian:

      I think I read somewhere that the bookies did lose a bunch of money on this one, that they didn’t actually get a balanced set of bets, but I’m not sure.

      Regarding your other question: the reasonableness of the 5000-1 odds would be assessed by comparison to other, similar events. As I’ve discussed in chapter 1 of BDA and elsewhere, such a comparison can be thought of as Bayesian (in that there is a model, explicit or implicit, of these probabilities, for example based on the idea that you win the championship by winning a series of games, which requires scoring a series of goals, etc.) or as frequentist, in that the event in question is taken as a sample from a reference set (hence the comparison to previous soccer seasons, Super Bowl odds, and so on).

      • Christian Hennig says:

        Indeed some papers reported that bookies had to pay out record sums but when I read the articles it wasn’t clear whether overall more was paid out than would’ve been for any other champion. Certainly one company had a record pay out but then they had more Leicester fans among their customer base than others. Overall I have read that bookies had to pay out £25m for this, which doesn’t seem a lot to me compared with what I’d think how many is bet in the UK on the PL champion. But I may be wrong.

        Regarding the other stuff, sure… but all will depend on how exactly the reference set is defined, which may make a big difference.

        • Andrew says:

          Christian:

          Regarding your second paragraph: Yes, I think we’re in agreement here. It’s hard to imagine an airtight calculation of the prospective probability that a team will win the soccer championship. Any estimate will be conditional on a model. And to do this for an election, that’s even tougher.

    • Laplace says:

      Hennig, you kill me with stuff. If you haven’t figured out the secret by now you never will, but here goes for the younger generation:

      Suppose you have a model of the world. “Model” doesn’t mean “statistical model”, but rather a faithful reflection of that part of the universe relevant to answer the question at hand (who will win in this case). Given a “state” of the model, you can determine who wins for that state.

      Unfortunately, we don’t know the true state, which leaves us in a quandary, but the situation isn’t hopeless because we can at least count. When we do so, it turns out that for every possible state where the team wins, there are 5000 other possible states where it looses.

      Now this doesn’t mean the team will lose. We don’t know that. And it DEFINITELY DOESN’T mean the team will only win 1 time out of 5000 replications (whatever that might mean). We have even less knowledge about the frequency of wins in imaginary repetitions then we did about the one real win we’re trying to predict.

      I image that to a Frequenist this count (1 winning state to every 5000 losing states) must thus seem like completely useless information. I can hear Frequentist shrieking now, “what good could it possibly be if it’s not the frequency of something?!?”

      Well, it’s not much admittedly, but it does tell us one thing namely:

      The prediction of “loss” is the outcome least sensitive to the unknown actual state, and to make a better prediction would require knowing more about the real state than we currently do.

      If the prediction turns out wrong, then either the Model/state setup was factually incorrect to begin with, or it was correct, but we needed to know more about the true state.

      • Christian Hennig says:

        Laplace: I’d guess that despite all your righteous fervour when actually setting up a model in the beginning of the season you’re in no better position than I am (unless of course you know more about the PL than me), regardless of whether I’d use imagining repetitions for this or not.

        • Laplace says:

          Well, I know that the “5000 to 1 odds” is a quantitative measure of the sensitivity I described and not a frequency. Clearing knowing what you’re tying to do is something. So at least I got that going for me:

          https://www.youtube.com/watch?v=X48G7Y0VWW4

          • Christian Hennig says:

            “Unfortunately, we don’t know the true state, which leaves us in a quandary, but the situation isn’t hopeless because we can at least count. When we do so, it turns out that for every possible state where the team wins, there are 5000 other possible states where it looses.” So you’re talking about FREQUENCIES of states. Uh-oh. It doesn’t seem useless to me, rather it seems almost frequentist to me. You’d just need to call your states hypothetical repetitions (which they are, in a sense, you could imagine your model generating possible worlds).

            • Laplace says:

              So to avoid the simple and common sense view that there’s one real state, but we don’t know what it is, you’re prepared to invent multiple universes or speculate on repetitions that don’t/can’t/will-never exist?

              My response to that is two fold. First, please don’t ever speak of Frequentism being about objective realty. It’s as big a fairy tail as Cinderella.

              Second, in simple enough situations this does no harm. In anything slightly more complicated, those unnecessary and irrelevant fantasies bring progress to a screeching halt. At some point no one’s intuition can tell what the correct “frequency” distribution means let alone what it should be, so they stop. If they understood what they were really doing, and what the purpose of it all was, they could continue confidently.

              • Christian Hennig says:

                “First, please don’t ever speak of Frequentism being about objective realty.”
                I don’t, anyway.

              • ojm says:

                “So you’re talking about FREQUENCIES of states”
                Exactly.

                “speculate on repetitions that don’t/can’t/will-never exist?”
                Hasn’t frequentism always been about *hypothetical* repetitions? Besides the pros/cons of this approach, under what circumstances assumptions are testable and to what degree etc this seems to be an important point to understand.

                What is your definition of frequentism? You do realise that frequentists distinguish sample and population right?

              • Rahul says:

                Laplace, I’m curious: You seem to be a master of the relevant philosophy and very well read about the area and its scientific history.

                But has this led you to fit better models? “Better” in the sense of predictive accuracy perhaps. In fact, do you or have you dirtied your hands with applied modelling or is this merely a philosophical / pedagogic crusade for you.

                Do you have any examples of models you have fit / published to show? I’d love to read. I’m not being sarcastic. Just trying to figure how much of this sort of debate has any pragmatic relevance. e.g. If I let you vs Christian model a practical problem with whatever be your differences would your model-predictions differ much?

              • Laplace says:

                No, fractions are not frequencies usually. If your accountant says “10% of your wealth is in cash”, you could try impressing them by saying,

                ” so if I took my wealth and magically divided it up into dollar for units and majically put it in a bowl and magically pick a series of things from the bowl, then approximately 10% of the time it would be cash”

                While Frequentists might find this profound and wise, your accountant wiil think you’re an idiot if not insane.

                That “5000 to 1” fraction has both meaning and purpose. It has this even if no replication is meaningful. It has this even if replications yielded a very different frequency number. It’s meaning and purpose has nothing to do with frequencies.

              • “so you’re talking about FREQUENCIES of states”

                actually, no, it’s about plausibilities of states, a state is more plausible to the extent that it predicts the data. The fact that many unobserved detailed-states lead to a single macro-state is one reason why we might assign high probability to a state, another is that it’s the only state that does predict the data and all the others predict something else… those two ideas sort of map to the prior and the likelihood. there are many reasons why a given state might wind up with high probability, and generally its a combination of that the product of prior and likelihood is large.

                ojm: “hasn’t frequentism always been about *hypothetical* repetitions?”

                Hypothetical repetitions of actions in the real world, instead of hypothetical repetitions of predictions from models.

                Hypothetical repetitions of actions in the real world fundamentally places a hypothetical structure on the way the universe works. That hypothetical structure is objectively false right from the start in VAST SWATHS of applied statistics.

                Hypothetical repetitions of predictions from models only requires that the structure of the mathematical model be well defined, so that a computer can run it over and over with different possible values (ie. Stan or BUGS or JAGS etc) it doesn’t require anything from the universe.

              • Laplace says:

                Rahul, if statistics was forever doomed to the shitty performance and lame capabilities it currently has (or marginal improvements there on) then no one would care about any of this. Least of all me.

              • Rahul.

                I can’t say much about Laplace’s models because I’ve never seen anything he’s modeled. But you can look at my dissertation here:

                http://models.street-artists.org/wp-content/uploads/2014/03/dissertation.pdf

                In it, I have a model for how waves travel along a “1D” bar of molecules. The computation is done using molecular dynamics. Using some modeling techniques based on ideas from Nonstandard Analysis, I derive an ODE for the wave propagation, that is, the propagation of statistical averages over the molecules. It looks like the wave equation, but with an added “momentum diffusion” term whose intensity scales like a certain function of the length of the bar as a fraction of the molecular length scale, the temperature, and soforth.

                If given a coefficient which is specific to each bar and related to the effective momentum diffusivity you can run the ODE and then get predictions for the measurements that are output from the molecular dynamics.

                To find the coefficients for each bar, I run a Bayesian calculation in which I write a likelihood for the timeseries of the logarithm of total kinetic wave energy. This likelihood is a transformed non-stationary gaussian process. it views the whole timeseries as a point in a high dimensional space. I had to do the sampling in the “mcmc” package in R, because Stan didn’t have an ODE solver at that point.

                Writing down that likelihood was directly a result of changing how I viewed statistics. Instead of thinking in terms of repeated sampling of errors through time, I thought in terms of plausibilities of different observed functions based on what errors I knew would be reasonable to expect, and what errors were unreasonable. The choice of covariance function was directly based on my knowledge of this physical process, not of any concept of repeated sampling.

                So, yes I think it has practical significance, it opens up modeling vastly.

              • Rahul says:

                But is there evidence that with the right *philosophical* indoctrination we can improve the “shitty performance”?

                To fit me a useful model I’d rather choose a wise Bayesian over a stupid frequentist & a wise frequentist over a stupid Bayesian. Because it seems to me that, so long as they are both smart, they end up producing almost equally good models. Though they will call things by different names and passionately hate the other side’s model’s philosophical groundings.

                On the other hand, you get Cuddy or Fiske write a model for you its probably crap anyways but that has little to do with Bayesian or Frequentist foundations.

              • ojm says:

                So ‘the plausibility of my wealth being cash is 0.1’ is better?

                A given frequentist probability is a single number which may be considered (at least as a first cut) as a fraction representing a property of an ideal (hypothetical population).

                A given fraction may be interpreted as a frequentist probability using a well defined algorithm/model for generating samples given this probability. This may or may not be relevant to the world, just like assigning a ‘plausibility’ to something may or may not be relevant to the world.

                What’s the plausibility God exists, Leicester city wins etc? Andrew seems to return to the question of defining a relevant reference population to answer this, which seems reasonable to me.

                Both are probably best viewed as some summary of some property of (ie within) a model eg how many states of type 1 are there vs type 2 or, if I simulate the model ‘forever’ (ie a large number of times) how often would I expect states of type 1 or 2 to occur.

              • Christian Hennig says:

                I come back to the connection between Laplace’s “counting of states” and frequentism. Later Laplace himself and Daniel write that it’s not all counting, one could have plausibility weights and then it’s more general ratios, not necessarily counts. Fair enough.

                However, I think that the fact that Laplace initially wrote about counting states (i.e., frequencies) illustrates very nicely that there is some “gravity” to frequencies as a standard conceptualisation of probabilities. If probabilities are just abstract measurements of plausibility that cannot be given a “closer to life” interpretation than that they fulfill Cox’s axioms (a major building block of Jaynes-style Bayesianism), it is hard to connect them to what we are interested in in life.

                Frequencies that stem from hypothetical repetitions of the world are very abstract and far from life, too, and insofar they’re just a crutch for thinking, and a very weak one at that; except that any crutch is probably helpful (even if only a little bit) to make sense of Cox’s axioms. Which, I’d assume, made Laplace write about counting states.

                The bookies will actually be interested in frequencies (although these are weighted, too, by the money that is bet); they ultimately need to predict distributions of bets (they may have a stab at predicting distributions of match results, too, although this is of secondary importance to their jobs), and any probability computation helpful to them should have an implication regarding such to be observed distributions. An abstract plausibility measurement won’t serve them.

              • Corey says:

                the fact that Laplace initially wrote about counting states (i.e., frequencies)

                Christian, in your opinion, were Venn’s and Boole’s frequentist criticisms of Bernoulli’s and Laplace’s classical definition of probability actually just a case of distinction without a difference?

              • Christian Hennig says:

                Corey: “Christian, in your opinion, were Venn’s and Boole’s frequentist criticisms of Bernoulli’s and Laplace’s classical definition of probability actually just a case of distinction without a difference?”

                Absolutely not, no. I’m not saying that it’s all the same really. I does make a difference indeed. What I think is that the concepts of probability that we’re discussing now still have traces of older ideas and intuitions in them, and that understanding this contributes to the understanding of the concepts. I have this view that may seem paradox at first sight, namely that a) it is important and helpful to be clear about the differences between the different concepts of probability (on which basis it can be seen that different concepts can be appropriate for different applications) and b) that it is also important to understand to what extent all these different concepts trace back to certain common intuitions and original ideas – although this has to be qualified, because things were not monolithic in the past either and different older strands of though went into different later developments not with all the same weight.

                So for example I’d suspect, making reference to an old discussion we had on your blog, that Cox was quite keen on coming up with technical conditions that enabled him to show that the kind of measurement of evidence he was interested in is equivalent to already existing probabilities. This connected him to a huge existing culture and made him relevant. At the same time it meant that what he came up with cannot be completely and cleanly separated from what the rest of that culture did. To me it seems therefore funny when “Laplace” emphasizes (a little bit here and much more elsewhere) that the frequentists need to understand that whenever it works what they do it is “actually really” (Jaynesian/Coxian) Bayes. Historically the frequentists were not first but they came before Cox and Jaynes, and Jaynes and Cox did what they do partly in order to accommodate what they thought worked well before they came. I think it’s an illusion to think that at the same time they can be totally free of what caused issues with the not so well working stuff before.

            • ojm: the cash example is neither a frequency, nor a probability. it’s just the relative size of one kind of asset to the size of all assets put together. A thing doesn’t magically transform into a plausibility simply because it’s a number between 0 and 1. That’s what Laplace was saying.

              • ojm says:

                If that’s the point then great, I agree. It doesn’t demonstrate that when it is a probability it doesn’t have both a reasonable frequentist and plausibility interpretation.

                Ie – just as you can derive the same ODE models with standard analysis as you can with nonstandard analysis, you can usually give a probability model both a frequentist and plausibility interpretation. You may prefer one to the other, fine, but some people are equally ok with epsilons and deltas.

              • ojm says:

                Speaking of probability interpretations, here’s a nice post Terrence Tao just made:

                https://terrytao.wordpress.com/2016/05/13/visualising-random-variables/

              • Andrew says:

                Ojm:

                Yes, what Tao writes is related to probabilistic programming, which is what’s done in Stan, in particular in the generated quantities block. But I disagree with Tao when he writes, “When teaching mathematics, the traditional method of lecturing in front of a blackboard is still hard to improve upon.” All the evidence I’ve seen, both formal and anecdotal, suggests that lecturing in front of a blackboard may be a great way for the instructor to learn the material, but it’s not such a good way to teach. Don’t get me wrong—it’s great to have a blackboard—but it’s my impression that blackboard derivations go in one ear and out the other.

              • ojm says:

                Hi Andrew,

                yup I actually agree. Personally I’ve been using both a pen-enabled tablet for derivations and a laptop with code/animations in lectures. Incidentally my Dad has finally decided to do his PhD and the topic is the use of pen-enabled tablets in mathematics education.

              • The thing I keep saying that I don’t think has gotten through is that a Frequentist model assumes restrictions on *what the universe can do* that are just false. People are then shocked, shocked I tell you when the hypothetical stuff predicted from hypothetical re-samplings fails to hold in the real world.

                What do you mean power poses don’t mean anything? we have frequentist guarantees on the size of the effects!

              • Andrew says:

                Daniel:

                To channel Mayo, it’s incorrect frequentist calculations that lead to these mistakes. Power-pose researchers not realizing that the p-value depends on what you would’ve done had the data been different, Monty-Hall-style. Had they done freq methods correctly (via preregistration, as in the 50 Shades of Gray paper), they most likely wouldn’t have made those mistakes.

                To channel me, it’s a problem with statistics in general (including Bayesian statistics) that it is sold as a way to get effective certainty from noisy data. Everybody’s textbooks are all too full of triumphant examples with uncertainty estimates that happily exclude zero. Students learn the lesson all too well.

              • ojm says:

                Daniel – I’m just not convinced that your or Laplace’s interpretations of frequentist methods are correct. If you’ve found other methods you like, cool, I’m not gonna stop you using them. Like I said the other day, I like hierarchical Bayes a lot myself.

                Like Andrew says, the biggest issue is using methods/models incorrectly, usually in the misguided pursuit of certainty or definitive answers from noisy situations.

              • Keith O'Rourke says:

                Andrew: “sold as a way to get effective certainty from noisy data. Everybody’s textbooks are all too full of triumphant examples”

                +10

              • Martha (Smith) says:

                I gotta agree with Keith. Here’s my own polemic from a few years ago on “expecting too much certainty” as a common mistake in using statistics: http://www.ma.utexas.edu/users/mks/statmistakes/uncertainty.html

  13. Paul Alper says:

    A little history and the money involved, courtesy of Wikipedia:

    The competition formed as the FA Premier League on 20 February 1992 following the decision of clubs in the Football League First Division to break away from the Football League, which was originally founded in 1888, and take advantage of a lucrative television rights deal.[2] The deal was worth £1 billion a year domestically as of 2013–14, with BSkyB and BT Group securing the domestic rights to broadcast 116 and 38 games respectively.[3] The league generates €2.2 billion per year in domestic and international television rights.[4] In 2014/15, teams were apportioned revenues of £1.6 billion.

  14. Olly Johnson says:

    One more complication; I’m not sure you can directly compare the competitiveness of US and European sports (“14th best team in the EPL is roughly equivalent to the 20th best team in MLB or the NBA or the NFL”). My impression is that US sports have mechanisms (drafts where weak teams pick first, salary caps) to keep competitions more random, whereas the EPL is the opposite (the rich tend to get richer, unless serious outside money – e.g. Chelsea, Man City – intervenes).

    So, for example, Leicester were only the 6th team to win the Premier League in the last 20+ years (with other European leagues similar), whereas it looks to me like there’s been 6 different World Series champions and 8 different Superbowl winners in 8 years.

  15. Jonathan says:

    Two reactions:

    1. As noted, the odds were the amount necessary to attract money to bet on Leicester and that before the season. This suggests the crowd under-estimated Leicester’s chances at the start and that the stories miss the point that of course the odds changed with each game.

    2. I think it’s absolutely correct to say the pre-season odds were out of whack and specifically for 1or 2 reasons, both referring to the model being used. That is, assuming the odds are set by the need to attract cash to a team, the crowd seems to have used a traditional big money Premiership model in which the only contenders for the actual title come from a short list. The second reason is that same model may have been used by the experts, but I don’t know if that’s true or to what extent. The model for the Premiership has changed as money has flowed to each team. For example, it was traditional that lower teams would sell their best players to the big boys as though that were a law. One of the contenders this year, Tottenham, has refused to sell Harry Kane and now teams are all rich enough that they don’t need to sell and certainly not within their own league. The change in transfers has been noted over the past few years. Another effect of more money is that even the worst team has solid players from all over the world; the average salary is now almost $3M, much more than any other league. The model used to be that ordinary teams hoped to generate talent they could sell and now the model is that each team has enough money to put some of the world’s best players on the pitch. The Premier League differs in this way from other top leagues. Consider Spain: a small number of top teams, meaning Barca, Real and Atletico, are essentially all-star teams that routinely dominate the rest. I’d say the changes in the Premier League were evident in prior seasons and that this year merely brought them out. Note that while we talk about Leicester, Tottenham was next and they also have never won. Under the prior model, ManU, Chelsea, Liverpool, etc. would have continued at the top because they would have scooped up the top players from lower teams and the other teams would not have the resources. The new model is closer to the NFL model of general parity.

  16. Shravan says:

    Andrew, can you elaborate a bit on why rare events should be modeled? In psycholinguistics, data representing rare events are simply deleted as irrelevant. I don’t know anything about the need to model rare events (esp. given the sparsity of data, how do we evaluate a model of a rare event?).

    • Andrew says:

      Shravan:

      You don’t have to model the probability that Leicester is going to win the football championship, but modeling it seems to be a good idea if you want to place bets on the event!

      • hgfalling says:

        If only there were some mechanism whereby retrospective modelers could place bets on their newly updated priors AFTER the event has taken place, so they could demonstrate how wrong the oddsmakers were at the time.

        • Andrew says:

          Hgfalling:

          See the P.S. and P.P.S. above. The short answer is that, sure, you can label these as one-of-a-kind events and say that it is impossible to evaluate the probabilities retrospectively. But to me that’s just giving up too early. These are not one-of-a-kind events; there are other soccer games and other elections. Oddsmakers can make errors, and I find Campos’s argument above to be persuasive. Maybe better odds would’ve been 200-1 or whatever, but 5000-1 does seem extreme. Sure, it took the unexpected event to make us realize this, but that doesn’t mean that there weren’t problems there all along.

          Consider this analogy: Hurricanes Katrina and Sandy made us realize that, in retrospect, New Orleans and New York had problems in their disaster preparedness. Yes, it took these hurricanes to make policymakers aware of the problem, but the problem was there, nonetheless.

          • hgfalling says:

            I’m not really convinced by Campos’ argument, because it basically seems to be an analogy to a different American sport and handwaving and calling prices “absurd.” Now this doesn’t mean that 5k-1 was or wasn’t a reasonable price. But prices like these are common enough across sports that if what he is saying is right, you could make a lot of money by just betting longshots on these futures bets. Even now after the Leicester disaster, you can bet on eight teams at 500/1 or better to win the EPL next year. And you could probably find scores of such opportunities across all sports, where models like “i don’t know anything about this sport, but that’s absurd!” would suggest that you would be a big winner with manageable variance.

            Good luck with that.

            • Andrew says:

              Hgfalling:

              It’s hard to beat the vig, so that fact that it’s not easy to go out and “be a big winner” does not mean the odds are alright. And you say, “you can bet on eight teams at 500/1 or better.” But there’s a big difference between 500-1 and 5000-1!

              Regarding your specific idea: In horseracing they talk about longshot bias which means that it’s the long odds which tend to be the bets to avoid. So it will depend on context. So I don’t recommend betting longshots as a general strategy.

              Regarding Campos: He wasn’t arguing that all longshot bets are good, he was talking about this particular case. And, the fact that Leicester won is indeed some evidence that these 5000-1 odds were off. Yes his argument was “handwaving”—that is, not quantitiative—but I think a quantitative argument (what above I called a model for rare events using precursors, an idea that you appeared to be mocking in your comment above) could make this argument more precise.

              • hgfalling says:

                “He wasn’t arguing that all longshot bets are good, he was talking about this particular case.”

                The main idea is that the exact same argument can be made retrospectively about any actual longshot winner; that Leicester City is not a particularly “special” extreme longshot in this regard (in a way that could be determined before the season).

                So if it’s true that the fair price before the season was like 200-1 for them, I think the same argument, if you accept it, establishes that the fair price for all the other longshot teams were similar (or better, since Leicester was generally judged to be one of the worst teams before the season). See how accepting this kind of retrospective argument leads to no team having really long odds and therefore longshots greater than 200-1 being good bets in general?

                As an aside, there isn’t any vig in cases like these: if they offer you a better price than fair, you have positive expectancy from the bet; they don’t charge you something else if you win or something. The vig in markets like these comes from the fact that the entire market adds up to >100% probability. So if you think all these teams are 200/1 instead of 1000/1, no problem, get the money.

              • Andrew says:

                Hgfalling:

                Indeed, you can make an argument that 5000-1 is ridiculous odds in a 20-team league; see the article referred to in Ivan’s comment here in which everyone is in agreement that those 5000-1 odds were nuts.

                Here you’re using 200-1 as a baseline. Again, there’s a big big difference between 200-1 and 5000-1: it’s a factor of 25! If the odds for Leicester had been 200-1 or even 500-1, I doubt Campos would’ve written that post.

                You do raise a good question, though, which was why there were no savvy betters scooping up this juicy lottery ticket ahead of time. That I don’t know. I wouldn’t’ve even thought of trawling through obscure sports betting opportunities to find this one.

                My guess is that savvy bettors know about the longshot bias, and so the sorts of bettors who were looking for good-value, positive-expectation bets, didn’t even think of looking for good-value longshots. The Leicester odds were hiding in plain sight. And of course if word had gotten out about these juicy odds and some people had started betting real money on it, the bookies would’ve adjusted occasionally, hence limiting their losses.

                In future, bettors will be aware of the possibility of good-value longshots, but of course the bookies will be aware also, so now it’s probably too late to follow your “no problem, get the money” strategy.

  17. Shravan says:

    I wanted to add to Rahul’s discussion with Laplace and Lakeland, but we are out of response depth in the comments. I don’t have Laplace and Lakeland’s encyclopedic (heck I don’t even know how to spell it, according to Chrome) knowledge of philosophy. I’m just using statistics to answer questions that I have in my area of research.

    I moved away from frequentist to Bayesian modeling in psycholinguistics some years ago (2012?), and my students and I have published maybe a dozen or more papers since then using Bayesian methods (Stan or JAGS). We still have the problem that we cannot replicate most of our and most of other people’s results. A lot of the stuff my field produces is just one-time hole-in-one lucky shots, never to be repeated, and it nothing to do with the sophistication or philosophy one brings to the problem. Psychology is probably the same. (If you submit a paper with replications, editors and reviewers ask you: what’s the point of wasting journal space on replications?)

    The problem lies with the kinds of questions we ask and the methods we use to answer them; we also don’t learn from experience. Even the assumption that mu has some fixed prior distribution, pure fiction (as Andrew has noted, I think). Right now I see a tendency among people to blithely ask even more subtle questions about language than we used to; we have learnt nothing from our embarrassing failure to even understand the drosophila of psycholinguistics, relative clauses.

    • Rahul says:

      Most of the crappy stats papers that Andrew links to exploit statistics as a tool to “putting lipstick on a pig”.

      Getting fancier, expensive lipstick doesn’t really help much.

      PS. We need “Lipstick on a Pig” in the Lexicon. :)

    • Shravan. I certainly don’t want to give an impression of encyclopedic knowledge. I have a lot of opinons, and a desire to understand what I’m doing, and I’ve spent time trying to write about my journey towards understanding what I’m doing.

      Planning to put up a blog post directly responding to your points, ie. the points that your Bayesian models are not reproducible either. I consider this a feature! (specifically, it’s a feature that you are aware that they don’t reproduce).

      • Rahul says:

        I look forward to that post.

        To view “non reproduciblity” as a feature is sort of upsetting my whole mental model of doing science. :)

      • Corey says:

        I think a distinction needs to be made here between “being able to model non-reproducible scenarios” on the one hand and “being unable to reliably cause a postulated effect” on the other.

      • Ok, I put up something this morning regarding Bayesian non-reproducibility and some of the ways I believe that Bayesian models both help us detect nonreproducibility, and also help us avoid overly precise inference that would lead to non-reproducibility (of course, not automatically, but at least provides a way to impose some knowledge that we can’t impose under IID sampling assumptions).

        http://models.street-artists.org/2016/05/17/bayesian-models-are-also-non-reproducible/

        • Keith O'Rourke says:

          Daniel:

          > When you collect more data and it fails to concentrate your parameter estimates, it indicates that there is no one parameter estimate that can explain the data sufficiently.

          This requires the likelihood to have specified a random (common in distribution) parameter rather than a common parameter – with a common parameter, likelihood always concentrates with more independent data – see Review of likelihood and some of its properties in my old post here http://andrewgelman.com/wp-content/uploads/2011/05/plot13.pdf
          (If I am wrong I would really like to know!)

          So its primarily getting the likelihood less wrong rather than any underlying Bayesian non-reproducibility.

          • Well, I guess it will concentrate, but the posterior of N+M observations will concentrate on a different place than the posterior of the first N observations, if there’s a systematic change in the parameters.

            For a simple example, suppose that y = normal(t,1) and you model it as normal(mu,sigma) and take 10 observations at time t=0, and then 10 observations at time t=10… After putting the first 10 observations into the Bayesian machinery your posterior concentrates around mu = 0, sigma = 1. Very nice, the world is a consistent place that we all love, think of how wonderful “random” sampling is!

            Now you add the next ten observations and it concentrates around mu = 5, sigma = 10 or so !!

            It always concentrates, but it need not always concentrate on the same place after each round of updating, and this inconsistency is helpful in detecting problems with the likelihood. The Bayesian machinery lets you detect this by giving you a clear way to update information from one dataset to the next, taking into account an assumption of a consistent model (which you can then detect is an incorrect assumption).

            Of course we can check things in a frequentist analysis as well and see that our model has inconsistency, but having a consistent logical framework that does this automatically is certainly helpful.

            • Keith O'Rourke says:

              Daniel:

              Sorry but rather than “helpful in detecting problems with the likelihood” it is a failing of Normality assumptions when it is not obvious that something is wrong with the model as the variance expands to hide the problem.

              Work to make Bayesian approaches robust to data contamination and errors in model misspecification is not finished but rather just starting.

              If you have not yet read this one effort by David Dunson http://arxiv.org/pdf/1506.06101.pdf I think you would really enjoy it.

              • There’s certainly no way to detect a problem with your model unless you look for the problem, and there are no doubt lots of ways to look for problems that haven’t been discovered yet. I’ll take a look at Dunson’s paper, I think I’ve seen it before but not read it in depth.

                The thing about Bayesian methods that you have to remember as a person using them is that they all ask the question “if my model is correct, what do I know about the parameters?”

                So, keeping in mind the dependence on the model being correct, and understanding what it means to be “correct” (ie. what does the Stan statement my_data ~ my_distrib(my_params) mean?) those are key things, and the meanings are different in a Frequentist and a Bayesian analysis and that’s also key for people moving to Bayesian analysis to understand.

    • Andrew says:

      Ivan:

      Thanks for the link. Here’s the key bit:

      William Hill said 25 customers took the 5,000-1 odds with the largest stake £20 from a customer in Manchester and the smallest 5p from a woman in Edinburgh.

      So they didn’t have much of a motivation to get the odds right on this one.

  18. Adrian says:

    The comparison between EPL and MLB isn’t great because MLB has mechanisms designed to create long-term parity. EPL has no such mechanisms. You get the players you pay for, there is no system to reward failure like a player draft. Thus the winner of the EPL was the winner of the prior season seven out of 24 seasons. Man United has won over half (13) of EPL titles; Chelsea 4, Man Arsenal 3, City 2, Blackburn 1(after a huge cash infusion), and now Leicester 1. So looking before the season started, it was 13, 4, 3, 2, 1. Implies that the underlying odds decrease exponentially as teams move down the standings. And Leicester didn’t get any sort of giant cash infusion like Blackburn had with a new owner a couple years before they won their title, plus Blackburn had finished 2nd the prior season. There has never been one of these “out of nowhere” EPL winning campaigns ever in 23 prior seasons. What’s more, nobody has ever come close (runners up). Also of note, Leicester won the league with a comparatively low point total (would have been 4th in ’14, ’09, ’08, ’06).

    If you look at Leicester’s prior season – they were last in the league at Christmas and went on a good run to avoid relegation, then fired their coach after his son was in a sex tape using racist slurs, and hired a manager who had been out of work since being fired 8 months prior. Their main striker had scored all of 5 EPL goals, better odds of being suspended for assault rather than scoring 20, his own video using racist slurs had just surfaced, and had a rep for showing up to training drunk.

    The 5000-1 odds were there for good reasons. Maybe 1000-1 would have been better odds (how can you even calculate the difference?), but it was still absolutely incredible that Leicester did what they did. 50-1 odds for the 14th-ranked team (West Bromwich Albion) to win next year are absolutely awful. I wouldn’t be surprised if a team that has finished in the bottom half of the league the prior season never wins the EPL ever again (barring a giant cash infusion).

    • Andrew says:

      Adrian:

      If reasonable odds were 500-1 or even 200-1, it was still very impressive that Leicester won. Buster Douglas’s odds were only 42-1 and people are still talking about that one! I think we’re all in agreement that, prospectively, Leicester deserved long odds of winning the championship. Just not 5000-1.

      You offer data from 23 prior seasons, ok, that’s fine, that gives us a factor of 1/23 right there. Still a long way from 1/23 to 1/5000. For example, if you want to call it 1/50 that any team in the bottom half will win the championship in any given years, and then say that Leicester was one of these 10 teams, that gives you 1/500. Obviously there are lots of other ways of doing these calculations (as I’ve said before, I think it makes sense to model based on precursor events), but, again, it takes a lot to get to 1/5000.

Leave a Reply