Comments on: Leicester City and Donald Trump: How to think about predictions and longshot victories?

By: Daniel Lakeland

Daniel Lakeland — Wed, 18 May 2016 15:36:53 +0000

There’s certainly no way to detect a problem with your model unless you look for the problem, and there are no doubt lots of ways to look for problems that haven’t been discovered yet. I’ll take a look at Dunson’s paper, I think I’ve seen it before but not read it in depth.

The thing about Bayesian methods that you have to remember as a person using them is that they all ask the question “if my model is correct, what do I know about the parameters?”

So, keeping in mind the dependence on the model being correct, and understanding what it means to be “correct” (ie. what does the Stan statement my_data ~ my_distrib(my_params) mean?) those are key things, and the meanings are different in a Frequentist and a Bayesian analysis and that’s also key for people moving to Bayesian analysis to understand.

By: Keith O'Rourke

Keith O'Rourke — Wed, 18 May 2016 12:42:20 +0000

In reply to Daniel Lakeland.

Daniel:

Sorry but rather than “helpful in detecting problems with the likelihood” it is a failing of Normality assumptions when it is not obvious that something is wrong with the model as the variance expands to hide the problem.

Work to make Bayesian approaches robust to data contamination and errors in model misspecification is not finished but rather just starting.

If you have not yet read this one effort by David Dunson http://arxiv.org/pdf/1506.06101.pdf I think you would really enjoy it.

By: Victor Ordu

Victor Ordu — Wed, 18 May 2016 08:52:16 +0000

In reply to Andrew.

This is an ‘aside’ comment: I’ve not studied this quantitatively, but among all Republican candidates it’s only Trump that I recall seeing people urging him to run for President as far back as 2008. I kept seeing this on Twitter and elsewhere on social media. I’m not an American, but when he declared to run, I told my friends he would likely win the nomination. I wasn’t betting – the facts speak for themselves. A comparative sentiment analysis on social media over the past 10 years could yield some interesting results. Unbeknownst to many, the it’s likely the man started working on this way before the others and came better prepared.

Ditto Leicester.

By: Daniel Lakeland

Daniel Lakeland — Tue, 17 May 2016 21:58:02 +0000

In reply to Keith O’Rourke.

Well, I guess it will concentrate, but the posterior of N+M observations will concentrate on a different place than the posterior of the first N observations, if there’s a systematic change in the parameters.

For a simple example, suppose that y = normal(t,1) and you model it as normal(mu,sigma) and take 10 observations at time t=0, and then 10 observations at time t=10… After putting the first 10 observations into the Bayesian machinery your posterior concentrates around mu = 0, sigma = 1. Very nice, the world is a consistent place that we all love, think of how wonderful “random” sampling is!

Now you add the next ten observations and it concentrates around mu = 5, sigma = 10 or so !!

It always concentrates, but it need not always concentrate on the same place after each round of updating, and this inconsistency is helpful in detecting problems with the likelihood. The Bayesian machinery lets you detect this by giving you a clear way to update information from one dataset to the next, taking into account an assumption of a consistent model (which you can then detect is an incorrect assumption).

Of course we can check things in a frequentist analysis as well and see that our model has inconsistency, but having a consistent logical framework that does this automatically is certainly helpful.

By: Keith O'Rourke

Keith O'Rourke — Tue, 17 May 2016 20:12:58 +0000

In reply to Daniel Lakeland.

Daniel:

> When you collect more data and it fails to concentrate your parameter estimates, it indicates that there is no one parameter estimate that can explain the data sufficiently.

This requires the likelihood to have specified a random (common in distribution) parameter rather than a common parameter – with a common parameter, likelihood always concentrates with more independent data – see Review of likelihood and some of its properties in my old post here http://statmodeling.stat.columbia.edu/wp-content/uploads/2011/05/plot13.pdf
(If I am wrong I would really like to know!)

So its primarily getting the likelihood less wrong rather than any underlying Bayesian non-reproducibility.

By: Daniel Lakeland

Daniel Lakeland — Tue, 17 May 2016 17:16:43 +0000

In reply to Daniel Lakeland.

Ok, I put up something this morning regarding Bayesian non-reproducibility and some of the ways I believe that Bayesian models both help us detect nonreproducibility, and also help us avoid overly precise inference that would lead to non-reproducibility (of course, not automatically, but at least provides a way to impose some knowledge that we can’t impose under IID sampling assumptions).

http://models.street-artists.org/2016/05/17/bayesian-models-are-also-non-reproducible/

By: Iggy

Iggy — Tue, 17 May 2016 14:54:05 +0000

In reply to Andrew.

The following two articles written in regards to this event by a London bookmaker, may be of interest:

https://gemsandrhinestones.com/2016/05/04/why-did-bookmakers-lose-on-leicester/

https://gemsandrhinestones.com/2016/05/05/what-price-should-leicester-have-been/

By: Adrian

Adrian — Tue, 17 May 2016 12:39:15 +0000

In reply to Andrew.

Although that’d be the Nassim Taleb approach to sports betting, wouldn’t it? 80% of your money on the favorites and 20% on ‘underpriced’ longshots…

By: Adrian

Adrian — Tue, 17 May 2016 12:35:00 +0000

In reply to Andrew. Nice. I know very little about betting (but quite a lot about football!).

By: Andrew

Andrew — Mon, 16 May 2016 20:59:52 +0000

In reply to Adrian. Adrian: See this comment. The evidence from horse racing, at least, is that longshots are already overbet, so in general I don't recommend longshot betting as a way of winning money. Especially since, if you want to make money reliably using this strategy, you'll have to find a lot of these undervalued longshot bets. And I don't think anyone suggests that there are many of these possibilities.

By: Adrian

Adrian — Mon, 16 May 2016 20:47:00 +0000

In reply to Andrew.

23 prior seasons, and I think nobody outside the top 7 of the prior season had ever won. Assuming the newly promoted teams are interchangeable with the relegated teams, that’s 13 * 23, and we’re 0 for 299. And I think a reasonable observer would have given the ’14-’15 Leicester City a lower-than-average chance of improving their position.
It’s going to be incredibly difficult to distinguish between 1:100, 1:500, 1:1000, and 1:5000. But of course that’s where some real money can be made. Find a bunch of longshots that are priced at 1:5000 but you think are actually 1:100, and you could make some real money. I bet (ha) that lots of people will be trying that this coming August.

By: Andrew

Andrew — Mon, 16 May 2016 20:25:42 +0000

In reply to Adrian.

Adrian:

If reasonable odds were 500-1 or even 200-1, it was still very impressive that Leicester won. Buster Douglas’s odds were only 42-1 and people are still talking about that one! I think we’re all in agreement that, prospectively, Leicester deserved long odds of winning the championship. Just not 5000-1.

You offer data from 23 prior seasons, ok, that’s fine, that gives us a factor of 1/23 right there. Still a long way from 1/23 to 1/5000. For example, if you want to call it 1/50 that any team in the bottom half will win the championship in any given years, and then say that Leicester was one of these 10 teams, that gives you 1/500. Obviously there are lots of other ways of doing these calculations (as I’ve said before, I think it makes sense to model based on precursor events), but, again, it takes a lot to get to 1/5000.

By: Adrian

Adrian — Mon, 16 May 2016 20:09:09 +0000

The comparison between EPL and MLB isn’t great because MLB has mechanisms designed to create long-term parity. EPL has no such mechanisms. You get the players you pay for, there is no system to reward failure like a player draft. Thus the winner of the EPL was the winner of the prior season seven out of 24 seasons. Man United has won over half (13) of EPL titles; Chelsea 4, Man Arsenal 3, City 2, Blackburn 1(after a huge cash infusion), and now Leicester 1. So looking before the season started, it was 13, 4, 3, 2, 1. Implies that the underlying odds decrease exponentially as teams move down the standings. And Leicester didn’t get any sort of giant cash infusion like Blackburn had with a new owner a couple years before they won their title, plus Blackburn had finished 2nd the prior season. There has never been one of these “out of nowhere” EPL winning campaigns ever in 23 prior seasons. What’s more, nobody has ever come close (runners up). Also of note, Leicester won the league with a comparatively low point total (would have been 4th in ’14, ’09, ’08, ’06).

If you look at Leicester’s prior season – they were last in the league at Christmas and went on a good run to avoid relegation, then fired their coach after his son was in a sex tape using racist slurs, and hired a manager who had been out of work since being fired 8 months prior. Their main striker had scored all of 5 EPL goals, better odds of being suspended for assault rather than scoring 20, his own video using racist slurs had just surfaced, and had a rep for showing up to training drunk.

The 5000-1 odds were there for good reasons. Maybe 1000-1 would have been better odds (how can you even calculate the difference?), but it was still absolutely incredible that Leicester did what they did. 50-1 odds for the 14th-ranked team (West Bromwich Albion) to win next year are absolutely awful. I wouldn’t be surprised if a team that has finished in the bottom half of the league the prior season never wins the EPL ever again (barring a giant cash infusion).

By: Andrew

Andrew — Mon, 16 May 2016 19:54:19 +0000

In reply to Andrew.

Hgfalling:

Indeed, you can make an argument that 5000-1 is ridiculous odds in a 20-team league; see the article referred to in Ivan’s comment here in which everyone is in agreement that those 5000-1 odds were nuts.

Here you’re using 200-1 as a baseline. Again, there’s a big big difference between 200-1 and 5000-1: it’s a factor of 25! If the odds for Leicester had been 200-1 or even 500-1, I doubt Campos would’ve written that post.

You do raise a good question, though, which was why there were no savvy betters scooping up this juicy lottery ticket ahead of time. That I don’t know. I wouldn’t’ve even thought of trawling through obscure sports betting opportunities to find this one.

My guess is that savvy bettors know about the longshot bias, and so the sorts of bettors who were looking for good-value, positive-expectation bets, didn’t even think of looking for good-value longshots. The Leicester odds were hiding in plain sight. And of course if word had gotten out about these juicy odds and some people had started betting real money on it, the bookies would’ve adjusted occasionally, hence limiting their losses.

In future, bettors will be aware of the possibility of good-value longshots, but of course the bookies will be aware also, so now it’s probably too late to follow your “no problem, get the money” strategy.

By: hgfalling

hgfalling — Mon, 16 May 2016 19:40:05 +0000

In reply to Andrew.

“He wasn’t arguing that all longshot bets are good, he was talking about this particular case.”

The main idea is that the exact same argument can be made retrospectively about any actual longshot winner; that Leicester City is not a particularly “special” extreme longshot in this regard (in a way that could be determined before the season).

So if it’s true that the fair price before the season was like 200-1 for them, I think the same argument, if you accept it, establishes that the fair price for all the other longshot teams were similar (or better, since Leicester was generally judged to be one of the worst teams before the season). See how accepting this kind of retrospective argument leads to no team having really long odds and therefore longshots greater than 200-1 being good bets in general?

As an aside, there isn’t any vig in cases like these: if they offer you a better price than fair, you have positive expectancy from the bet; they don’t charge you something else if you win or something. The vig in markets like these comes from the fact that the entire market adds up to >100% probability. So if you think all these teams are 200/1 instead of 1000/1, no problem, get the money.

By: Andrew

Andrew — Mon, 16 May 2016 19:39:13 +0000

In reply to Ivan. Ivan: Thanks for the link. Here's the key bit:

William Hill said 25 customers took the 5,000-1 odds with the largest stake £20 from a customer in Manchester and the smallest 5p from a woman in Edinburgh.

So they didn't have much of a motivation to get the odds right on this one.

By: Ivan

Ivan — Mon, 16 May 2016 19:24:25 +0000

Here’s a good article on how the bookies chose 5000-1 for Leicester title win.

http://www.thisismoney.co.uk/money/news/article-3571428/Why-did-bookies-offer-crazily-high-5-000-1-odds-Leicester-City.html

By: Andrew

Andrew — Mon, 16 May 2016 17:38:15 +0000

In reply to hgfalling.

Hgfalling:

It’s hard to beat the vig, so that fact that it’s not easy to go out and “be a big winner” does not mean the odds are alright. And you say, “you can bet on eight teams at 500/1 or better.” But there’s a big difference between 500-1 and 5000-1!

Regarding your specific idea: In horseracing they talk about longshot bias which means that it’s the long odds which tend to be the bets to avoid. So it will depend on context. So I don’t recommend betting longshots as a general strategy.

Regarding Campos: He wasn’t arguing that all longshot bets are good, he was talking about this particular case. And, the fact that Leicester won is indeed some evidence that these 5000-1 odds were off. Yes his argument was “handwaving”—that is, not quantitiative—but I think a quantitative argument (what above I called a model for rare events using precursors, an idea that you appeared to be mocking in your comment above) could make this argument more precise.

By: hgfalling

hgfalling — Mon, 16 May 2016 17:19:39 +0000

In reply to Andrew.

I’m not really convinced by Campos’ argument, because it basically seems to be an analogy to a different American sport and handwaving and calling prices “absurd.” Now this doesn’t mean that 5k-1 was or wasn’t a reasonable price. But prices like these are common enough across sports that if what he is saying is right, you could make a lot of money by just betting longshots on these futures bets. Even now after the Leicester disaster, you can bet on eight teams at 500/1 or better to win the EPL next year. And you could probably find scores of such opportunities across all sports, where models like “i don’t know anything about this sport, but that’s absurd!” would suggest that you would be a big winner with manageable variance.

Good luck with that.

By: Andrew

Andrew — Mon, 16 May 2016 16:12:32 +0000

In reply to hgfalling. Hgfalling: See the P.S. and P.P.S. above. The short answer is that, sure, you can label these as one-of-a-kind events and say that it is impossible to evaluate the probabilities retrospectively. But to me that's just giving up too early. These are not one-of-a-kind events; there are other soccer games and other elections. Oddsmakers can make errors, and I find Campos's argument above to be persuasive. Maybe better odds would've been 200-1 or whatever, but 5000-1 does seem extreme. Sure, it took the unexpected event to make us realize this, but that doesn't mean that there weren't problems there all along. Consider this analogy: Hurricanes Katrina and Sandy made us realize that, in retrospect, New Orleans and New York had problems in their disaster preparedness. Yes, it took these hurricanes to make policymakers aware of the problem, but the problem was there, nonetheless.

By: hgfalling

hgfalling — Mon, 16 May 2016 15:02:06 +0000

In reply to Andrew. If only there were some mechanism whereby retrospective modelers could place bets on their newly updated priors AFTER the event has taken place, so they could demonstrate how wrong the oddsmakers were at the time.

By: Martha (Smith)

Martha (Smith) — Sun, 15 May 2016 20:27:31 +0000

In reply to Daniel Lakeland.

I gotta agree with Keith. Here’s my own polemic from a few years ago on “expecting too much certainty” as a common mistake in using statistics: http://www.ma.utexas.edu/users/mks/statmistakes/uncertainty.html

By: Daniel Lakeland

Daniel Lakeland — Sun, 15 May 2016 15:21:20 +0000

In reply to Rahul. to be fair, its the fact that Shravan realizes that things are nonreproducible that's the feature.

By: Christian Hennig

Christian Hennig — Sun, 15 May 2016 12:23:17 +0000

In reply to Laplace.

Corey: “Christian, in your opinion, were Venn’s and Boole’s frequentist criticisms of Bernoulli’s and Laplace’s classical definition of probability actually just a case of distinction without a difference?”

Absolutely not, no. I’m not saying that it’s all the same really. I does make a difference indeed. What I think is that the concepts of probability that we’re discussing now still have traces of older ideas and intuitions in them, and that understanding this contributes to the understanding of the concepts. I have this view that may seem paradox at first sight, namely that a) it is important and helpful to be clear about the differences between the different concepts of probability (on which basis it can be seen that different concepts can be appropriate for different applications) and b) that it is also important to understand to what extent all these different concepts trace back to certain common intuitions and original ideas – although this has to be qualified, because things were not monolithic in the past either and different older strands of though went into different later developments not with all the same weight.

So for example I’d suspect, making reference to an old discussion we had on your blog, that Cox was quite keen on coming up with technical conditions that enabled him to show that the kind of measurement of evidence he was interested in is equivalent to already existing probabilities. This connected him to a huge existing culture and made him relevant. At the same time it meant that what he came up with cannot be completely and cleanly separated from what the rest of that culture did. To me it seems therefore funny when “Laplace” emphasizes (a little bit here and much more elsewhere) that the frequentists need to understand that whenever it works what they do it is “actually really” (Jaynesian/Coxian) Bayes. Historically the frequentists were not first but they came before Cox and Jaynes, and Jaynes and Cox did what they do partly in order to accommodate what they thought worked well before they came. I think it’s an illusion to think that at the same time they can be totally free of what caused issues with the not so well working stuff before.

By: Marco

Marco — Sun, 15 May 2016 08:51:45 +0000

In reply to Rahul.

Yes, the 5000:1 odds are fixed-odds before the season starts. I don’t know about pari mutuel betting on the Premiere League, but for instance betfair (https://www.betfair.com/sport/football) lets you bet on either of win, draw or lose. I think the odds are being adjusted when new (big) bets are added.

My main point was that those bookmakers odds do not necessarily represent a good or accurate estimate of the real probability of the event in question occurring! And they should not be interpreted like this.

By: Corey

Corey — Sun, 15 May 2016 01:29:21 +0000

In reply to Daniel Lakeland. I think a distinction needs to be made here between "being able to model non-reproducible scenarios" on the one hand and "being unable to reliably cause a postulated effect" on the other.

By: Corey

Corey — Sun, 15 May 2016 01:20:49 +0000

In reply to Laplace.

the fact that Laplace initially wrote about counting states (i.e., frequencies)

Christian, in your opinion, were Venn's and Boole's frequentist criticisms of Bernoulli's and Laplace's classical definition of probability actually just a case of distinction without a difference?

By: Keith O'Rourke

Keith O'Rourke — Sat, 14 May 2016 21:44:07 +0000

In reply to Daniel Lakeland. Andrew: "sold as a way to get effective certainty from noisy data. Everybody’s textbooks are all too full of triumphant examples" +10

By: Christian Hennig

Christian Hennig — Sat, 14 May 2016 18:02:05 +0000

In reply to Laplace.

I come back to the connection between Laplace’s “counting of states” and frequentism. Later Laplace himself and Daniel write that it’s not all counting, one could have plausibility weights and then it’s more general ratios, not necessarily counts. Fair enough.

However, I think that the fact that Laplace initially wrote about counting states (i.e., frequencies) illustrates very nicely that there is some “gravity” to frequencies as a standard conceptualisation of probabilities. If probabilities are just abstract measurements of plausibility that cannot be given a “closer to life” interpretation than that they fulfill Cox’s axioms (a major building block of Jaynes-style Bayesianism), it is hard to connect them to what we are interested in in life.

Frequencies that stem from hypothetical repetitions of the world are very abstract and far from life, too, and insofar they’re just a crutch for thinking, and a very weak one at that; except that any crutch is probably helpful (even if only a little bit) to make sense of Cox’s axioms. Which, I’d assume, made Laplace write about counting states.

The bookies will actually be interested in frequencies (although these are weighted, too, by the money that is bet); they ultimately need to predict distributions of bets (they may have a stab at predicting distributions of match results, too, although this is of secondary importance to their jobs), and any probability computation helpful to them should have an implication regarding such to be observed distributions. An abstract plausibility measurement won’t serve them.

By: Rahul

Rahul — Sat, 14 May 2016 17:59:48 +0000

In reply to Daniel Lakeland. I look forward to that post. To view "non reproduciblity" as a feature is sort of upsetting my whole mental model of doing science. :)

By: Daniel Lakeland

Daniel Lakeland — Sat, 14 May 2016 16:40:44 +0000

In reply to Shravan.

Shravan. I certainly don’t want to give an impression of encyclopedic knowledge. I have a lot of opinons, and a desire to understand what I’m doing, and I’ve spent time trying to write about my journey towards understanding what I’m doing.

Planning to put up a blog post directly responding to your points, ie. the points that your Bayesian models are not reproducible either. I consider this a feature! (specifically, it’s a feature that you are aware that they don’t reproduce).

By: Martha (Smith)

Martha (Smith) — Sat, 14 May 2016 16:37:14 +0000

In reply to Rahul. +1

By: Rahul

Rahul — Sat, 14 May 2016 15:45:15 +0000

In reply to Shravan.

Most of the crappy stats papers that Andrew links to exploit statistics as a tool to “putting lipstick on a pig”.

Getting fancier, expensive lipstick doesn’t really help much.

PS. We need “Lipstick on a Pig” in the Lexicon. :)

By: Rahul

Rahul — Sat, 14 May 2016 15:15:35 +0000

In reply to Marco.

Here’s a naive question: The 5000-to-1 odds are fixed-odds bets offered by the bookmakers, right?

Was there a parallel pari mutuel betting on the event too? It’d be interesting to see what the odds people placed on this outcome there.

Or is there no difference, in practice? i.e. Do the fixed-odds offered by bookies match the autogenously arising pari mutuel betting odds approximately?

By: Andrew

Andrew — Sat, 14 May 2016 14:14:35 +0000

In reply to Shravan. Shravan: You don't have to model the probability that Leicester is going to win the football championship, but modeling it seems to be a good idea if you want to place bets on the event!

By: Shravan

Shravan — Sat, 14 May 2016 05:38:45 +0000

I wanted to add to Rahul’s discussion with Laplace and Lakeland, but we are out of response depth in the comments. I don’t have Laplace and Lakeland’s encyclopedic (heck I don’t even know how to spell it, according to Chrome) knowledge of philosophy. I’m just using statistics to answer questions that I have in my area of research.

I moved away from frequentist to Bayesian modeling in psycholinguistics some years ago (2012?), and my students and I have published maybe a dozen or more papers since then using Bayesian methods (Stan or JAGS). We still have the problem that we cannot replicate most of our and most of other people’s results. A lot of the stuff my field produces is just one-time hole-in-one lucky shots, never to be repeated, and it nothing to do with the sophistication or philosophy one brings to the problem. Psychology is probably the same. (If you submit a paper with replications, editors and reviewers ask you: what’s the point of wasting journal space on replications?)

The problem lies with the kinds of questions we ask and the methods we use to answer them; we also don’t learn from experience. Even the assumption that mu has some fixed prior distribution, pure fiction (as Andrew has noted, I think). Right now I see a tendency among people to blithely ask even more subtle questions about language than we used to; we have learnt nothing from our embarrassing failure to even understand the drosophila of psycholinguistics, relative clauses.

By: ojm

ojm — Sat, 14 May 2016 05:30:47 +0000

In reply to Daniel Lakeland.

Daniel – I’m just not convinced that your or Laplace’s interpretations of frequentist methods are correct. If you’ve found other methods you like, cool, I’m not gonna stop you using them. Like I said the other day, I like hierarchical Bayes a lot myself.

Like Andrew says, the biggest issue is using methods/models incorrectly, usually in the misguided pursuit of certainty or definitive answers from noisy situations.

By: Shravan

Shravan — Sat, 14 May 2016 05:25:25 +0000

Andrew, can you elaborate a bit on why rare events should be modeled? In psycholinguistics, data representing rare events are simply deleted as irrelevant. I don’t know anything about the need to model rare events (esp. given the sparsity of data, how do we evaluate a model of a rare event?).

By: Andrew

Andrew — Sat, 14 May 2016 04:29:29 +0000

In reply to Daniel Lakeland.

Daniel:

To channel Mayo, it’s incorrect frequentist calculations that lead to these mistakes. Power-pose researchers not realizing that the p-value depends on what you would’ve done had the data been different, Monty-Hall-style. Had they done freq methods correctly (via preregistration, as in the 50 Shades of Gray paper), they most likely wouldn’t have made those mistakes.

To channel me, it’s a problem with statistics in general (including Bayesian statistics) that it is sold as a way to get effective certainty from noisy data. Everybody’s textbooks are all too full of triumphant examples with uncertainty estimates that happily exclude zero. Students learn the lesson all too well.

By: Daniel Lakeland

Daniel Lakeland — Sat, 14 May 2016 03:45:28 +0000

In reply to Daniel Lakeland. The thing I keep saying that I don't think has gotten through is that a Frequentist model assumes restrictions on *what the universe can do* that are just false. People are then shocked, shocked I tell you when the hypothetical stuff predicted from hypothetical re-samplings fails to hold in the real world. What do you mean power poses don't mean anything? we have frequentist guarantees on the size of the effects!

By: ojm

ojm — Sat, 14 May 2016 03:28:50 +0000

In reply to Daniel Lakeland. Hi Andrew, yup I actually agree. Personally I've been using both a pen-enabled tablet for derivations and a laptop with code/animations in lectures. Incidentally my Dad has finally decided to do his PhD and the topic is the use of pen-enabled tablets in mathematics education.

By: Andrew

Andrew — Sat, 14 May 2016 03:22:27 +0000

In reply to Daniel Lakeland.

Ojm:

Yes, what Tao writes is related to probabilistic programming, which is what’s done in Stan, in particular in the generated quantities block. But I disagree with Tao when he writes, “When teaching mathematics, the traditional method of lecturing in front of a blackboard is still hard to improve upon.” All the evidence I’ve seen, both formal and anecdotal, suggests that lecturing in front of a blackboard may be a great way for the instructor to learn the material, but it’s not such a good way to teach. Don’t get me wrong—it’s great to have a blackboard—but it’s my impression that blackboard derivations go in one ear and out the other.

By: ojm

ojm — Sat, 14 May 2016 03:00:52 +0000

In reply to Daniel Lakeland.

Speaking of probability interpretations, here’s a nice post Terrence Tao just made:

https://terrytao.wordpress.com/2016/05/13/visualising-random-variables/

By: ojm

ojm — Sat, 14 May 2016 02:38:20 +0000

In reply to Daniel Lakeland.

If that’s the point then great, I agree. It doesn’t demonstrate that when it is a probability it doesn’t have both a reasonable frequentist and plausibility interpretation.

Ie – just as you can derive the same ODE models with standard analysis as you can with nonstandard analysis, you can usually give a probability model both a frequentist and plausibility interpretation. You may prefer one to the other, fine, but some people are equally ok with epsilons and deltas.

By: Daniel Lakeland

Daniel Lakeland — Sat, 14 May 2016 02:25:17 +0000

In reply to Christian Hennig. ojm: the cash example is neither a frequency, nor a probability. it's just the relative size of one kind of asset to the size of all assets put together. A thing doesn't magically transform into a plausibility simply because it's a number between 0 and 1. That's what Laplace was saying.

By: Jonathan

Jonathan — Sat, 14 May 2016 02:19:26 +0000

Two reactions:

1. As noted, the odds were the amount necessary to attract money to bet on Leicester and that before the season. This suggests the crowd under-estimated Leicester’s chances at the start and that the stories miss the point that of course the odds changed with each game.

2. I think it’s absolutely correct to say the pre-season odds were out of whack and specifically for 1or 2 reasons, both referring to the model being used. That is, assuming the odds are set by the need to attract cash to a team, the crowd seems to have used a traditional big money Premiership model in which the only contenders for the actual title come from a short list. The second reason is that same model may have been used by the experts, but I don’t know if that’s true or to what extent. The model for the Premiership has changed as money has flowed to each team. For example, it was traditional that lower teams would sell their best players to the big boys as though that were a law. One of the contenders this year, Tottenham, has refused to sell Harry Kane and now teams are all rich enough that they don’t need to sell and certainly not within their own league. The change in transfers has been noted over the past few years. Another effect of more money is that even the worst team has solid players from all over the world; the average salary is now almost $3M, much more than any other league. The model used to be that ordinary teams hoped to generate talent they could sell and now the model is that each team has enough money to put some of the world’s best players on the pitch. The Premier League differs in this way from other top leagues. Consider Spain: a small number of top teams, meaning Barca, Real and Atletico, are essentially all-star teams that routinely dominate the rest. I’d say the changes in the Premier League were evident in prior seasons and that this year merely brought them out. Note that while we talk about Leicester, Tottenham was next and they also have never won. Under the prior model, ManU, Chelsea, Liverpool, etc. would have continued at the top because they would have scooped up the top players from lower teams and the other teams would not have the resources. The new model is closer to the NFL model of general parity.

By: ojm

ojm — Sat, 14 May 2016 02:13:28 +0000

In reply to Laplace.

So ‘the plausibility of my wealth being cash is 0.1’ is better?

A given frequentist probability is a single number which may be considered (at least as a first cut) as a fraction representing a property of an ideal (hypothetical population).

A given fraction may be interpreted as a frequentist probability using a well defined algorithm/model for generating samples given this probability. This may or may not be relevant to the world, just like assigning a ‘plausibility’ to something may or may not be relevant to the world.

What’s the plausibility God exists, Leicester city wins etc? Andrew seems to return to the question of defining a relevant reference population to answer this, which seems reasonable to me.

Both are probably best viewed as some summary of some property of (ie within) a model eg how many states of type 1 are there vs type 2 or, if I simulate the model ‘forever’ (ie a large number of times) how often would I expect states of type 1 or 2 to occur.

By: Martha (Smith)

Martha (Smith) — Sat, 14 May 2016 02:08:01 +0000

In reply to Jack. "the sport of meticulously plotted domination" is depressing.

By: Rahul

Rahul — Sat, 14 May 2016 01:41:43 +0000

In reply to Laplace.

But is there evidence that with the right *philosophical* indoctrination we can improve the “shitty performance”?

To fit me a useful model I’d rather choose a wise Bayesian over a stupid frequentist & a wise frequentist over a stupid Bayesian. Because it seems to me that, so long as they are both smart, they end up producing almost equally good models. Though they will call things by different names and passionately hate the other side’s model’s philosophical groundings.

On the other hand, you get Cuddy or Fiske write a model for you its probably crap anyways but that has little to do with Bayesian or Frequentist foundations.

By: Daniel Lakeland

Daniel Lakeland — Sat, 14 May 2016 01:37:02 +0000

In reply to Laplace.

Rahul.

I can’t say much about Laplace’s models because I’ve never seen anything he’s modeled. But you can look at my dissertation here:

http://models.street-artists.org/wp-content/uploads/2014/03/dissertation.pdf

In it, I have a model for how waves travel along a “1D” bar of molecules. The computation is done using molecular dynamics. Using some modeling techniques based on ideas from Nonstandard Analysis, I derive an ODE for the wave propagation, that is, the propagation of statistical averages over the molecules. It looks like the wave equation, but with an added “momentum diffusion” term whose intensity scales like a certain function of the length of the bar as a fraction of the molecular length scale, the temperature, and soforth.

If given a coefficient which is specific to each bar and related to the effective momentum diffusivity you can run the ODE and then get predictions for the measurements that are output from the molecular dynamics.

To find the coefficients for each bar, I run a Bayesian calculation in which I write a likelihood for the timeseries of the logarithm of total kinetic wave energy. This likelihood is a transformed non-stationary gaussian process. it views the whole timeseries as a point in a high dimensional space. I had to do the sampling in the “mcmc” package in R, because Stan didn’t have an ODE solver at that point.

Writing down that likelihood was directly a result of changing how I viewed statistics. Instead of thinking in terms of repeated sampling of errors through time, I thought in terms of plausibilities of different observed functions based on what errors I knew would be reasonable to expect, and what errors were unreasonable. The choice of covariance function was directly based on my knowledge of this physical process, not of any concept of repeated sampling.

So, yes I think it has practical significance, it opens up modeling vastly.