Minor-league Stats Predict Major-league Performance, Sarah Palin, and Some Differences Between Baseball and Politics

In politics, as in baseball, hot prospects from the minors can have trouble handling big-league pitching.

Right after Sarah Palin was chosen as the Republican nominee for vice president in 2008, my friend Ubs, who grew up in Alaska and follows politics closely, wrote the following:

Palin would probably be a pretty good president. . . . She is fantastically popular. Her percentage approval ratings have reached the 90s. Even now, with a minor nepotism scandal going on, she’s still about 80%. . . . How does one do that? You might get 60% or 70% who are rabidly enthusiastic in their love and support, but you’re also going to get a solid core of opposition who hate you with nearly as much passion. The way you get to 90% is by being boringly competent while remaining inoffensive to people all across the political spectrum.

Ubs gives a long discussion of Alaska’s unique politics and then writes:

Palin’s magic formula for success has been simply to ignore partisan crap and get down to the boring business of fixing up a broken government. . . . It’s not a very exciting answer, but it is, I think, why she gets high approval ratings — because all the Democrats, Libertarians, and centrists appreciate that she’s doing a good job on the boring non-partisan stuff that everyone agrees on and she isn’t pissing them off by doing anything on the partisan stuff where they disagree.

Hey–I bet you never thought you’d see the words “boringly competent,” “inoffensive,” and “Sarah Palin” in the same sentence!

Prediction and extrapolation

OK, so what’s the big deal? Palin got a reputation as a competent nonpartisan governor but when she hit the big stage she shifted to hyper-partisanship. The contrast is interesting to me because it suggests a failure of extrapolation.

Now let’s move to baseball. One of the big findings of baseball statistics guru Bill James is that minor-league statistics, when correctly adjusted, predict major-league performance. James is working through a three-step process: (1) naive trust in minor league stats, (2) a recognition that raw minor league stats are misleading, (3) a statistical adjustment process, by which you realize that there really is a lot of information there, if you know how to use it.

For a political analogy, consider Scott Brown. When he was running for the Senate last year, political scientist Boris Shor analyzed his political ideology. The question was, how would he vote in the Senate if he were elected? Boris wrote:

We have evidence from multiple sources. The Boston Globe, in its editorial endorsing Coakley, called Brown “in the mode of the national GOP.” Liberal bloggers have tried to tie him to the Tea Party movement, making him out to be very conservative. Chuck Schumer called him “far-right.”

In 2002, he filled out a Votesmart survey on his policy positions in the context of running for the State Senate. Looking through the answers doesn’t reveal too much beyond that he is a pro-choice, anti-tax, pro-gun Republican. His interest group ratings are all over the map. . . .

All in all, a very confusing assessment, and quite imprecise. So how do we compare Brown to other state legislators, or more generally to other politicians across the country?

My [Boris’s] research, along with Princeton’s Nolan McCarty, allows us to make precisely these comparisons. Essentially, I use the entirety of state legislative voting records across the country, and I make them comparable by calibrating them through Project Votesmart’s candidate surveys.

By doing so, I can estimate Brown’s ideological score very precisely. It turns out that his score is -0.17, compared with her score of 0.02. Liberals have lower scores; conservatives higher ones. Brown’s score puts him at the 34th percentile of his party in Massachusetts over the 1995-2006 time period. In other words, two thirds of other Massachusetts Republican state legislators were more conservative than he was. This is evidence for my [Boris’s] claim that he’s a liberal even in his own party. What’s remarkable about this is the fact that Massachusetts Republicans are the most, or nearly the most, liberal Republicans in the entire country!

Very Jamesian, wouldn’t you say? And Boris’s was borne out by Scott Brown’s voting record, where he indeed was the most liberal of the Senate’s Republicans.

Political extrapolation

OK, now back to Sarah Palin. First, her popularity. Yes, Gov. Palin was popular, but Alaska is a small (in population) state, and surveys

find that most of the popular governors in the U.S. are in small states. Here are data from 2006 and 2008:

governors.png

There are a number of theories about this pattern; what’s relevant here is that a Bill James-style statistical adjustment might be necessary before taking state-level stats to the national level.

The difference between baseball and politics

There’s something else going on, though. It’s not just that Palin isn’t quite so popular as she appeared at first. There’s also a qualitative shift. From “boringly competent nonpartisan” to . . . well, leaving aside any questions of competence, she’s certainly no longer boring or nonpartisan! In baseball terms, this is like Ozzie Smith coming up from the minors and becoming a Dave Kingman-style slugger. (Please excuse my examples which reveal how long it’s been since I’ve followed baseball!)

So how does baseball differ from politics, in ways that are relevant to statistical forecasting?

1. In baseball there is only one goal: winning. Scoring more runs than the other team. Yes, individual players have other goals: staying healthy, getting paid, not getting traded to Montreal, etc., but overall the different goals are aligned, and playing well will get you all of these to some extent.

But there are two central goals in politics: winning and policy. You want to win elections, but the point of winning is to enact policies that you like. (Sure, there are political hacks who will sell out to the highest bidder, but even these political figures represent some interest groups with goals beyond simply being in office.)

Thus, in baseball we want to predict how a player can help his team win, but in politics we want to predict two things: electoral success and also policy positions.

2. Baseball is all about ability–natural athletic ability, intelligence (as Bill James said, that and speed are the only skills that are used in both offense and defense), and plain old hard work, focus, and concentration. The role of ability in politics is not so clear. In his remarks that started this discussion, Ubs suggested that Palin had the ability and inclination to solve real problems. But it’s not clear how to measure such abilities in a way that would allow any generalization to other political settings.

3. Baseball is the same environment at all levels. The base paths are the same length in the major leagues as in AA ball (at least, I assume that’s true!), the only difference is that in the majors they throw harder. OK, maybe the strike zone and the field dimensions vary, but pretty much it’s the same game.

In politics, though, I dunno. Some aspects of politics really do generalize. The Massachusetts Senate has got to be a lot different from the U.S. Senate, but, in their research, Boris Shor and Nolan McCarty have shown that there’s a lot of consistency in how people vote in these different settings. But I suspect things are a lot different for the executive, where your main task is not just to register positions on issues but to negotiate.

4. In baseball, you’re in or out. If you’re not playing (or coaching), you’re not really part of the story. Sportswriters can yell all they want but who cares. In contrast, politics is full of activists, candidates, and potential candidates. In this sense, the appropriate analogy is not that Sarah Palin started as Ozzie Smith and then became Dave Kingman, but rather a move from being Ozzie Smith to being a radio call-in host, in a world in which media personalities can be as powerful, and as well-paid, as players on the field. Perhaps this could’ve been a good move for, say, Bill Lee, in this alternative universe? A player who can’t quite keep the ball over the plate but is a good talker with a knack for controversy?

Commenter Paul made a good point here:

How many at-bats long is a governorship? The most granular I could imagine possibly talking is a quarter. At the term level we’d be doing better making each “at-bat” independent of the previous. 20 or so at-bats don’t have much predictive value either. Even over a full 500 at-bat season, fans try to figure out whether a big jump in BABIP is a sign of better bat control or luck.

The same issues arise at very low at-bat counts too. If you bat in front of a slugger, you can sit on pitches in the zone. If you’ve got a weakness against a certain pitching style, you might not happen to see it. And once the ball is in the air, luck is a huge factor in if it travels to a fielder or between them.

I suspect if we could somehow get a political candidate to hold 300-400 different political jobs in different states, with different party goals and support, we’d be able to do a good job predicting future job performance, even jumping from state to national levels. But the day to day successes of a governor are highly correlative.

Indeed, when it comes to policy positions, a politician has lots of “plate appearances,” that is, opportunities to vote in the legislature. But when it comes to elections, a politician will only have at most a couple dozen in his or her entire career.

All the above is from a post from 2011. I thought about it after this recent exchange with Mark Palko regarding the political candidacy of Ron DeSantis.

In addition to everything above, let me add one more difference between baseball and politics. In baseball, the situation is essentially fixed, and pretty much all that matters is player ability. In contrast, in politics, the most important factor is the situation. In general elections in the U.S., the candidate doesn’t matter that much. (Primaries are a different story.) In summary, to distinguish baseball players in ability we have lots of data to estimate a big signal; to distinguish politicians in vote-getting ability we have very little data to estimate a small signal.

Lefty Driesell and Bobby Knight

This obit of the legendary Maryland basketball coach reminded me of a discussion we had a few years ago. It started with a remark in a published article by political scientist Diana Mutz identifying herself as “a Hoosier by birth and upbringing, the daughter of a former Republican officeholder, and someone who still owns a home in Mike Pence’s hometown.”

That’s interesting: I don’t know so many children of political officeholders! Actually, I can’t think of anyone I know, other than Mutz, who is a child of a political officeholder, but perhaps there are some such people in my social network. I don’t know the occupations of most of my friends’ parents.

Anyway, following up on that bit from Mutz, sociologist Steve Morgan added some background of his own:

I was also born in Indiana, and in fact my best friend in the 1st grade, before I left the state, was Pat Knight. To me, his father, Bobby Knight was a pleasant and generally kind man (who used to give us candy bars, etc.). He turned out to be a Trump supporter, and probably his son too. So, in addition to not appreciating his full basketball personality when I was 6 years old, I also did not see his potential to find a demagogue inspiring. We moved to Ohio, where I received a lot of education in swing-state politics and Midwestern resentment of coastal elites.

And then I threw in my two cents:

I was not born in Indiana, but I grew up in suburban Maryland (about 10 miles from Brett Kavanaugh, but I went to a public school in a different part of the county and so had zero social overlap with his group). One of the kids in my school was Chuck Driesell, son of Lefty Driesell, former basketball coach at the University of Maryland. Lefty is unfortunately now most famous for his association with Len Bias, but Chuck and I were in high school before that all happened, when Lefty was famous for being a good coach who couldn’t ever quite beat North Carolina. Once I remember the Terps decided to beat Dean Smith at his own game by doing the four corners offense themselves. But it didn’t work; I think Maryland ended up losing 21-18 or some other ping-pong-like score. Chuck was in my economics class. I have no idea if he’s now a Trump supporter. I guess it’s possible. One of the other kids in that econ class was an outspoken conservative, one of the few Reagan supporters of our group of friends back in 1980. Chuck grew up and became a basketball coach; the other kid grew up and became an economist.

I never went to a Maryland basketball game all the time I lived there, even when I was a student at the university. I wish I’d gone; I bet it would’ve been a lot of fun. My friends and I played some pickup soccer and basketball, and I watched lots of sports on TV, but for whatever reason we never even considered the idea of going to a game. We didn’t attend any of high school football games either, even though our school’s team was the state champions. This was not out of any matter of principle; we just never thought of going. Our loss.

The Lakatos soccer training

Alex Lax writes:

While searching the Internet for references to Lakatos, I noticed your comment about Lakatos being a Stalinist. I met Imre Lakatos shortly after his arrival in the UK. My parents spoke Hungarian and helped to settle the refugees to 1956. Imre Lakatos was one of those the refugees. I remember him playing football with me at a time when Hungarian football was seen as far superior to English football, and I also remember once when we met him at Cambridge railway station with his latest girlfriend who was very tall. She had managed to lose some contact lenses and I was grovelling around on the road trying to find them. During his visits he would often complain about his treatment in prison which destroyed his stomach and he would rant against the Communists. However after his death, I was told that a book by a well known French Communist was dedicated to Imre. I have not found this dedication but if true would suggest that he was a Communist of some flavour while pretending otherwise.

I hope this might be of interest to you.

He adds:

By the way, the Lakatos soccer training consisted of two players on a small pitch with two smallish opposing goals, with each player protecting their own goal. Each player was only allowed to touch the ball once.

I’m interested in Lakatos because his writing has been very influential to my work; see for example here and here. He was said to be a very difficult person, but perhaps that was connected in some way to his uncompromising intellectual nature, which served him well as an innovator in the philosophy of science.

Progress in 2023, Leo edition

Following Andrew, Aki, Jessica, and Charles, and based on Andrew’s proposal, I list my research contributions for 2023.

Published:

  1. Egidi, L. (2023). Seconder of the vote of thanks to Narayanan, Kosmidis, and Dellaportas and contribution to the Discussion of ‘Flexible marked spatio-temporal point processes with applications to event sequences from association football’Journal of the Royal Statistical Society Series C: Applied Statistics72(5), 1129.
  2. Marzi, G., Balzano, M., Egidi, L., & Magrini, A. (2023). CLC Estimator: a tool for latent construct estimation via congeneric approaches in survey research. Multivariate Behavioral Research, 58(6), 1160-1164.
  3. Egidi, L., Pauli, F., Torelli, N., & Zaccarin, S. (2023). Clustering spatial networks through latent mixture models. Journal of the Royal Statistical Society Series A: Statistics in Society186(1), 137-156.
  4. Egidi, L., & Ntzoufras, I. (2023). Predictive Bayes factors. In SEAS IN. Book of short papers 2023 (pp. 929-934). Pearson.
  5. Macrì Demartino, R., Egidi, L., & Torelli, N. (2023). Power priors elicitation through Bayes factors. In SEAS IN. Book of short papers 2023 (pp. 923-928). Pearson.

Preprints:

  1. Consonni, G., & Egidi, L. (2023). Assessing replication success via skeptical mixture priorsarXiv preprint arXiv:2401.00257. Submitted.

Softwares:

    CLC estimator

  • free and open-source app to estimate latent unidimensional constructs via congeneric approaches in survey research (Marzi et al., 2023)

   footBayes package (CRAN version 0.2.0)

   pivmet package (CRAN version 0.5.0)

I hope and guess that the paper dealing with the replication crisis, “Assessing replication success via skeptical mixture priors” with Guido Consonni, could have good potential in the Bayesian assesment of replication success in social and hard sciences; this paper can be seen as an extension of the paper written by Leonhard Held and Samuel Pawel entitled “The Sceptical Bayes Factor for the Assessment of Replication Success“.  Moreover, I am glad that the paper “Clustering spatial networks through latent mixture models“, focused on a model-based clustering approach defined in a hybrid latent space, has been finally published in JRSS A.

Regarding softwares, the footBayes package, a tool to fit the most well-known soccer (football) models through Stan and maximum likelihood methods, has been deeply developed and enriched with new functionalities (2024 objective: incorporate CmdStan with VI/Pathfinder algorithms and write a package’s paper in JSS/R Journal format).

Hey wassup Detroit Pistons? What’s gonna happen for the rest of the season? Let’s get (kinda) Bayesian. With graphs and code (but not a lot of data; sorry):

Paul Campos points us to this discussion of the record of the Detroit professional basketball team:

The Detroit Pistons broke the NBA record for most consecutive losses in a season last night, with their 27th loss in a row. . . . A team’s record is, roughly speaking, a function of two factors:

(1) The team’s quality. By “quality” I mean everything about the team”s performance that isn’t an outcome of random factors, aka luck — the ability of the players, individually and collectively, the quality of the coaching, and the quality of the team’s management, for example.

(2) Random factors, aka luck.

The relative importance of luck and skill?

The above-linked post continues:

How do we disentangle the relative importance of these two factors when evaluating a team’s performance to some point in the season? . . . The best predictor ex ante of team performance is the evaluation of people who gamble on that performance. I realize that occasionally gambling odds include significant inefficiencies, in the form of the betting public making sentimental rather than coldly rational wagers, but this is very much the exception rather than the rule. . . . the even money over/under for Detroit’s eventual winning percentage this season was, before the first game was played, a winning percentage of .340. To this point, a little more than third of the way through the season, Detroit’s winning percentage has been .0666. . . .

To the extent that the team has had unusually bad luck, then one would expect the team’s final record to be better. But how much better? Here we can again turn to the savants of Las Vegas et. al., who currently set the even money odds of the team’s final record on the basis of the assumption that it will have a .170 winning percentage in its remaining games.

Campos shares a purported Bayesian analysis and summarizes, “if we have just two pieces of information — a prior assumption of a .340 team, and the subsequent information of a .066 performance through thirty games — the combination of these two pieces of information yields a posterior prediction of a .170 winning percentage going forward, which remarkably enough is exactly what the current gambling odds predict! . . . it appears that the estimate being made by professional gamblers is that about two-thirds of Detroit’s worse than expected record is a product of an ex ante overestimate of the team’s quality, while the other third is assumed to be accounted for by bad luck.”

I think that last statement is coming from the fact that (1/3)*0.340 + (2/3)*0.067 is approximately 0.170.

I don’t quite follow his Bayesian logic. But never mind about that for now.

As I said, I didn’t quite follow the Bayesian logic shared by Campos. Here’s my problem. He posts this graph:

I think I understand the “No_Prior_Info” curve in the graph: that’s the y ~ binomial(n, p) likelihood for p, given the data n=30, y=2. But I don’t understand where the “Prior” and “Posterior” curves come from. I guess the Prior distribution has a mean of 0.340 and the Posterior distribution has a mean of 0.170, but where are the widths of these curves coming from?

Part of the confusion here is that we’re dealing with inference for p (the team’s “quality,” as summarized by the probability that they’d win against a randomly-chosen opponent on a random day) and also with predictions of outcomes. For the posterior mean, there’s no difference: under the basic model, the posterior expected proportion of future games won is equal to the posterior mean of p. It gets trickier when we talk about uncertainty in p.

How, then, could we take the beginning-of-season and current betting lines–which we will, for the purposes of our discussion here, identify as the prior and posterior means of p, ignoring systematic biases of bettors–and extract implied prior and posterior distributions? There’s surely enough information here to do this, if we use information from all 30 teams and calibrate properly.

Exploratory analysis

I started by going to the internet, finding various sources on betting odds, team records, and score differentials, and entering the data into this file. The latest Vegas odds I could find on season records were from 19 Dec; everything else came from 27 Dec.

Next step was to make some graphs. First, I looked at point differential and team records so far:

nba <- read.table("nba2023.txt", header=TRUE, skip=1)
nba$ppg <- nba$avg_points
nba$ppg_a <- nba$avg_points_opponent
nba$ppg_diff <- nba$ppg - nba$ppg_a
nba$record <- nba$win_fraction
nba$start_odds <- nba$over_under_beginning/82
nba$dec_odds <- nba$over_under_as_of_dec/82
nba$sched <- - (nba$schedule_strength - mean(nba$schedule_strength)) # signed so that positive value implies a more difficult schedule so far in season
nba$future_odds <- (82*nba$dec_odds - 30*nba$record)/52

pdf("nba2023_1.pdf", height=3.5, width=10)
par(mfrow=c(1,2), oma=c(0,0,2,0))
par(mar=c(3,3,1,1), mgp=c(1.5,.5,0), tck=-.01)
#
par(pty="s")
rng <- range(nba$ppg_a, nba$ppg)
plot(rng, rng, xlab="Points per game allowed", ylab="Points per game scored", bty="l", type="n")
abline(0, 1, lwd=.5, col="gray")
text(nba$ppg_a, nba$ppg, nba$team, col="blue")
#
par(pty="m")
plot(nba$ppg_diff, nba$record, xlab="Point differential", ylab="Won/lost record so far", bty="l", type="n")
text(nba$ppg_diff, nba$record, nba$team, col="blue")
#
mtext("Points per game and won-lost record as of 27 Dec", line=.5, side=3, outer=TRUE)
dev.off()

Here's a question you should always ask yourself: What do you expect to see?

Before performing any statistical analysis it's good practice to anticipate the results. So what do you think these graphs will look like?
- Ppg scored vs. ppg allowed. What do you expect to see? Before making the graph, I could have imagined it going either way: you might expect a negative correlation, with some teams doing the run-and-gun and others the physical game, or you might expect a positive correlation, because some teams are just much better than others. My impression is that team styles don't vary as much as they used to, so I was guessing a positive correlation.
- Won/lost record vs. point differential. What do you expect to see? Before making the graph, I was expecting a high correlation. Indeed, if I could only use one of these two metrics to estimate a team's ability, I'd be inclined to use point differential.

Aaaand, here's what we found:

Hey, my intuition worked on these! It would be interesting to see data from other years to see if I just got lucky with that first one.

Which is a better predictor of won-loss record: ppg scored or ppg allowed?

OK, this is a slight distraction from Campos's question, but now I'm wondering, which is a better predictor of won-loss record: ppg scored or ppg allowed? From basic principles I'm guessing they're about equally good.

Let's do a couple of graphs:

pdf("nba2023_2.pdf", height=3.5, width=10)
par(mfrow=c(1,3), oma=c(0,0,2,0))
par(mar=c(3,3,1,1), mgp=c(1.5,.5,0), tck=-.01)
#
par(pty="m")
rng <- range(nba$ppg_a, nba$ppg)
plot(rng, range(nba$record), xlab="Points per game scored", ylab="Won/lost record so far", bty="l", type="n")
abline(0, 1, lwd=.5, col="gray")
text(nba$ppg, nba$record, nba$team, col="blue")
#
par(pty="m")
plot(rng, range(nba$record), xlab="Points per game allowed", ylab="Won/lost record so far", bty="l", type="n")
abline(0, 1, lwd=.5, col="gray")
text(nba$ppg_a, nba$record, nba$team, col="blue")
#
par(pty="m")
plot(range(nba$ppg_diff), range(nba$record), xlab="Avg score differential", ylab="Won/lost record so far", bty="l", type="n")
abline(0, 1, lwd=.5, col="gray")
text(nba$ppg_diff, nba$record, nba$team, col="blue")
#
mtext("Predicting won-loss record from ppg, ppg allowed, and differential", line=.5, side=3, outer=TRUE)
dev.off()

Which yields:

So, about what we expected. To round it out, let's try some regressions:

library("rstanarm")
print(stan_glm(record ~ ppg, data=nba, refresh=0), digits=3)
print(stan_glm(record ~ ppg_a, data=nba, refresh=0), digits=3)
print(stan_glm(record ~ ppg + ppg_a, data=nba, refresh=0), digits=3)

The results:

            Median MAD_SD
(Intercept) -1.848  0.727
ppg          0.020  0.006

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.162  0.021 
------
            Median MAD_SD
(Intercept)  3.192  0.597
ppg_a       -0.023  0.005

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.146  0.019 
------
            Median MAD_SD
(Intercept)  0.691  0.335
ppg          0.029  0.002
ppg_a       -0.030  0.002

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.061  0.008

So, yeah, points scored and points allowed are about equal as predictors of won-loss record. Given that, it makes sense to recode as ppg differential and total points:

print(stan_glm(record ~ ppg_diff + I(ppg + ppg_a), data=nba, refresh=0), digits=3)

Here's what we get:

               Median MAD_SD
(Intercept)     0.695  0.346
ppg_diff        0.029  0.002
I(ppg + ppg_a) -0.001  0.001

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.062  0.009

Check. Once we include ppg_diff as a predictor, the average total points doesn't do much of anything. Again, it would be good to check with data from other seasons, as 30 games per team does not supply much of a sample.

Now on to the betting lines

Let's now include the Vegas over-unders in our analysis. First, some graphs:

pdf("nba2023_3.pdf", height=3.5, width=10)
par(mfrow=c(1,3), oma=c(0,0,2,0))
par(mar=c(3,3,1,1), mgp=c(1.5,.5,0), tck=-.01)
#
par(pty="s")
rng <- range(nba$start_odds, nba$record)
plot(rng, rng, xlab="Betting line at start", ylab="Won/lost record so far", bty="l", type="n")
abline(0, 1, lwd=.5, col="gray")
text(nba$start_odds, nba$record, nba$team, col="blue")
#
par(pty="s")
rng <- range(nba$record, nba$dec_odds)
plot(rng, rng, xlab="Won/lost record so far", ylab="Betting line in Dec", bty="l", type="n")
abline(0, 1, lwd=.5, col="gray")
text(nba$record, nba$dec_odds, nba$team, col="blue")
#
par(pty="s")
rng <- range(nba$start_odds, nba$dec_odds)
plot(rng, rng, xlab="Betting line at start", ylab="Betting line in Dec", bty="l", type="n")
abline(0, 1, lwd=.5, col="gray")
text(nba$start_odds, nba$dec_odds, nba$team, col="blue")
#
mtext("Won-lost record and over-under at start and in Dec", line=.5, side=3, outer=TRUE)
dev.off()

Which yields:

Oops--I forgot to make some predictions before looking. In any case, the first graph is kinda surprising. You'd expect to see an approximate pattern of E(y|x) = x, and we do see that--but not at the low end. The teams that were predicted to do the worst this year are doing even worse than expected. It would be interesting to see the corresponding graph for earlier years. My guess is that this year is special, not only in the worst teams doing so bad, but in them underperforming their low expectations.

The second graph is as one might anticipate: Betters are predicting some regression toward the mean. Not much, though! And the third graph doesn't tell us much beyond the first graph.

Upon reflection, I'm finding the second graph difficult to interpret. The trouble is that "Betting line in Dec" is the forecast win percentage for the year, but 30/82 of that is the existing win percentage. (OK, not every team has played exactly 30 games, but close enough.) What I want to do is just look at the forecast for their win percentage for the rest of the season:

pdf("nba2023_4.pdf", height=3.5, width=10)
par(mfrow=c(1,3), oma=c(0,0,2,0))
par(mar=c(3,3,1,1), mgp=c(1.5,.5,0), tck=-.01)
#
par(pty="s")
rng <- range(nba$record, nba$dec_odds)
plot(rng, rng, xlab="Won/lost record so far", ylab="Betting line of record for rest of season", bty="l", type="n")
abline(0, 1, lwd=.5, col="gray")
fit <- coef(stan_glm(future_odds ~ record, data=nba, refresh=0))
print(fit)
abline(fit, lwd=.5, col="blue")
text(nba$record, nba$future_odds, nba$team, col="blue")
#
dev.off()

Here's the graph:

The fitted regression line has a slope of 0.66:

            Median MAD_SD
(Intercept) 0.17   0.03  
record      0.66   0.05  

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.05   0.01 

Next step is to predict the Vegas prediction for the rest of the season given the initial prediction and the team's record so far:

print(stan_glm(future_odds ~ start_odds + record, data=nba, refresh=0), digits=2)

            Median MAD_SD
(Intercept) -0.02   0.03 
start_odds   0.66   0.10 
record       0.37   0.06 

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.03   0.00  

It's funny--everywhere we look, we see this 0.66. And 30 games is 37% of the season!

Now let's add into the regression the points-per-game differential, as this should include additional information beyond what was in the won-loss so far:

print(stan_glm(future_odds ~ start_odds + record + ppg_diff, data=nba, refresh=0), digits=2)

            Median MAD_SD
(Intercept) 0.06   0.06  
start_odds  0.67   0.09  
record      0.20   0.11  
ppg_diff    0.01   0.00  

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.03   0.00 

Hard to interpret this one, as ppg_diff is on a different scale from the rest. Let's quickly standardize it to be on the same scale as the won-lost record so far:

nba$ppg_diff_std <- nba$ppg_diff * sd(nba$ppg_record) / sd(nba$ppg_diff)
print(stan_glm(future_odds ~ start_odds + record + ppg_diff_std, data=nba, refresh=0), digits=2)

             Median MAD_SD
(Intercept)  0.06   0.06  
start_odds   0.67   0.09  
record       0.20   0.11  
ppg_diff_std 0.17   0.10  

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.03   0.00  

OK, not enough data to cleanly disentangle won-lost record and point differential as predictors here. My intuition would be that, once you have point differential, the won-lost record tells you very little about what will happen in the future, and the above fitted model is consistent with that intuition, but it's also consistent with the two predictors being equally important, indeed it's consistent with point differential being irrelevant conditional on won-lost record.

What we'd want to do here--and I know I'm repeating myself--is to repeat the analysis using data from previous years.

Interpreting the implied Vegas prediction for the rest of the season as an approximate weighted average of the preseason prediction and the current won-lost record

In any case, the weighting seems clear: approx two-thirds from starting odds and one-third from the record so far, which at least on a naive level seems reasonable, given that the season is about one-third over.

Just for laffs, we can also throw in difficulty of schedule, as that could alter our interpretation of the teams' records so far.

nba$sched_std <- nba$sched * sd(nba$record) / sd(nba$sched)
print(stan_glm(future_odds ~ start_odds + record + ppg_diff_std + sched_std, data=nba, refresh=0), digits=2)

             Median MAD_SD
(Intercept)  0.06   0.06  
start_odds   0.68   0.09  
record       0.21   0.11  
ppg_diff_std 0.17   0.10  
sched_std    0.04   0.03 

So, strength of schedule does not supply much information. This makes sense, given that 30 games is enough for the teams' schedules to mostly average out.

The residuals

Now that I've fit the regression, I'm curious about the residuals. Let's look:

fit_5 <- stan_glm(future_odds ~ start_odds + record + ppg_diff_std + sched_std, data=nba, refresh=0)
fitted_5 <- fitted(fit_5)
resid_5 <- resid(fit_5)
#
pdf("nba2023_5.pdf", height=5, width=8)
par(mar=c(3,3,1,1), mgp=c(1.5,.5,0), tck=-.01)
#
par(pty="m")
plot(fitted_5, resid_5, xlab="Vegas prediction of rest-of-season record", ylab="Residual from fitted model", bty="l", type="n")
abline(0, 0, lwd=.5, col="gray")
text(fitted_5, resid_5, nba$team, col="blue")
#
dev.off()

And here's the graph:

The residual for Detroit is negative (-0.05*52 = -2.6, so the Pistons are expected to win about 3 games less than their regression prediction based on prior odds and outcome of first 30 games). Cleveland and Boston are also expected to do a bit worse than the model would predict. On the other direction, Vegas is predicting that Memphis will win about 4 games more than predicted from the regression model.

I have no idea whassup with Memphis. The quick generic answer is that the regression model is crude, and bettors have other information not included in the regression.

Reverse engineering an implicit Bayesian prior

OK, now for the Bayesian analysis. As noted above, we aren't given a prior for team j's average win probability, p_j; we're just given a prior point estimate of each p_j.

But we can use the empirical prior-to-posterior transformation, along with the known likelihood function, under the simplifying assumption the 30 win-loss outcomes for each team j are independent with constant probability p_j for team j. This assumption that is obviously wrong, given that teams are playing each other, but let's just go with it here, recognizing that with full data it would be straightforward to extend to an item-response model with an ability parameter for each team (as here).

To continue, the above regression models show that the Vegas "posterior Bayesian" prediction of p_j after 30 games is approximately a weighted average of 0.65*(prior prediction) + 0.35*(data won-loss record). From basic Bayesian algebra (see, for example, chapter 2 of BDA), this tells us that the prior has about 65/35 as much information as data from 30 games. So, informationally, the prior is equivalent to the information from (65/35)*30 = 56 games, about two-thirds of a season worth of information.

Hey--what happened??

But, wait! That approximate 2/3 weighting for the prior and 1/3 weighting of the data from 30 games is the opposite of what Campos reported, which was a 1/3 weighting of the prior and 2/3 of the data. Recall: prior estimated win probability of 0.340, data win rate of 0.067, take (1/3)*0.340 + (2/3)*0.067 and you get 0.158, which isn't far from the implied posterior estimate of 0.170.

What happened here is that the Pistons are an unusual case, partly because the Vegas over-under for their season win record is a few percentage points lower than the linear model predicted, and partly because when the probability is low, a small percentage-point change in the probability corresponds to a big change in the implicit weights.

Again, it would be good to check all this with data from other years.

Skill and luck

There's one more loose end, and that's Campos taking the weights assigned to data and prior and characterizing them as "skill" and "luck" in prediction errors. I didn't follow that part of the reasoning at all so I'll just let it go for now. Part of the problem here is in one place Campos seems to be talking about skill and luck as contributors to the team's record, and in another place he seems to considering them as contributors to the difference between preseason predictions and actual outcomes.

One way to think about skill and luck in a way that makes sense to me is within an item-response-style model in which the game outcome is a stochastic function of team abilities and predictable factors. For example, in the model,

score differential = ability of home team - ability of away team + home-field advantage + error,

the team abilities are in the "skill" category and the error is in the "luck" category, and, ummm, I guess home-field advantage counts as "skill" too? OK, it's not so clear that the error in the model should all be called "luck." If a team plays better against a specific opponent by devising a specific offensive/defensive plan, that's skill, but it would pop up in the error term above.

In any case, once we've defined what is skill and what is luck, we can partition the variance of the total to assign percentages to each.

Another way of looking at this is to consider the extreme case of pure luck. If outcomes determined only by luck, then each game is a coin flip, and we'd see this in the data because the team win proportions after 30 games would follow a binomial distribution with n=30 and p=0.5. The actual team win proportions have mean 0.5 (of course) and sd 0.18, as compared to the theoretical mean of 0.5 and sd of 0.5/sqrt(30) = 0.09. That simple calculation suggests that skill is (0.18/0.09)^2 = 4 times as important as luck when determining the outcome of 30 games.

And maybe I'm getting just getting this all tangled myself. The first shot at any statistical analysis often will have some mix of errors in data, modeling, computing, and general understanding, with that last bit corresponding to the challenge of mapping from substantive concepts to mathematical and statistical models. Some mixture of skill and luck, I guess.

Summary

1. Data are king. In the immortal words of Hal Stern, the most important aspect of a statistical analysis is not what you do with the data, it’s what data you use. I could do more than Campos did, not so much because of my knowledge of Bayesian statistics but because I was using data from all 30 teams.

2. To continue with that point, you can do lots better than me by including data from other years.

3. Transparency is good. All my data and code are above. I might well have made some mistakes in my analyses, and, in any case, many loose ends remain.

4. Basketball isn't so important (hot hand aside). The idea of backing out an effective prior by looking at information updating, that's a more general idea worth studying further. This little example is a good entry point into the potential challenge of such studies.

5. Models can be useful, not just for prediction but also for understanding, as we saw for the problem of partitioning outcomes into skill and luck.

P.S. Last week, when the Pistons were 2-25 or something like that, I was taking with someone who's a big sports fan but not into analytics, the kind of person who Bill James talked about when he said that people interpret statistics as words that describe a situation rather than as numbers that can be added, subtracted, multiplied, and divided. The person I was talking with predicted that the Pistons would win no more than 6 games this year. I gave the statistical argument why this was unlikely: (a) historically there's been regression to the mean, with an improving record among the teams that have been doing the worst and an average decline among the teams at the top of the standings, (b) if a team does unexpectedly poorly, you can attribute some of that to luck. Also, 2/30 = 0.067, and 5/82 = 0.061, so if you bet that the Pistons will win no more than 6 games this season, you're actually predicting they might do worse in the rest of the season. All they need to do is get lucky in 5 of the remaining games. He said, yeah, sure, but they don't look like they can do it. Also, now all the other teams are trying extra hard because nobody wants to be the team that loses to the Pistons. OK, maybe. . . .

Following Campos, I'll just go with the current Vegas odds and give a point prediction the Pistons will end the season with about 11 wins.

P.P.S. Also related is a post from a few years back, “The Warriors suck”: A Bayesian exploration.

P.P.P.S. Unrelatedly, except for the Michigan connection, I recommend these two posts from a couple years ago:

What is fame? The perspective from Niles, Michigan. Including an irrelevant anecdote about “the man who invented hedging”

and

Not only did this guy not hold the world record in the 100 meter or 110-yard dash for 35 years, he didn’t even run the 110-yard dash in 10.8 seconds, nor did he see a million patients, nor was he on the Michigan football and track teams, nor did Michigan even have a track team when he attended the university. It seems likely that he did know Jack Dempsey, though.

Enjoy.

Adding intermediate outcomes to an item-response (Bradley-Terry) model

Huib Meulenbelt writes:

Assume we have the following hierarchical Bradley-Terry model:

data {
  int<lower=0> K;                                // players
  int<lower=0> N;                                // number of rallies
  int<lower=1, upper=K> player_a;     // player a
  int<lower=1, upper=K> player_b;     // player b
  int<lower=0> y;                                 // number of rallies won
}
parameters {
  real<lower=0> sigma;
  vector[K] skill;                                   // ability for player K
}
model{
  sigma ~ lognormal(0, 0.5); 
  skill ~ normal(0, sigma);
  y ~ binomial_logit(N, skill[player_a] - skill[player_b]);
}

In this blog post you argue “there’s a lot of information in the score (or vote) differential that’s thrown away if you just look at win/loss.”

I agree completely. Of each rally I obtained the length of the rally and I would like this variable to influence the skill level of the player and opponent. The skill level of the two players are closer to each other when the game ends in 11-7 and the match lasts 500 seconds than when the game ends in 11-7 and the match only lasts 100 seconds.

So, we move from
p[player A wins over player B] = logistic(skill_A – skill_B)

to

p[player A wins over player B] = logistic(f(rally_length) * (skill_A – skill_B))

How would you define this function?

My reply: This is known in psychometrics as an item-response model with a discrimination parameter. The multiplier, f(rally_length) in the above notation, is called the discrimination: the idea is that the higher it is, the more predictive the skill-level difference is of the outcome. If discrimination is zero, the skill difference doesn’t predict at all, and a negative discrimination corresponds to a prediction that goes in the unexpected direction (the worse player being more likely to win).

My answer to the immediate question above is: sure, try it out. You could start with some simple form for the function f and see how it works. Ultimately I’m not thrilled with this model because it is not generative. I expect you can do better by modeling the length of the rally as an intermediate outcome. You can do this in Stan too. I’d recommend starting with just a single parameter per player, but you might need to add another parameter for each player if the rally length varies systematically by player after adjusting for ability.

But the biggest thing is . . . above you say that you agree with me to model score differential and not just win/loss, but then in your model you’re only including win/loss as an outcome. You’re throwing away information! Don’t do that. Whatever model you use, I strongly recommend you use score differential, not win/loss, as your outcome.

OK, this time they’re taking sabermetrics a bit too far . . .

Jake Mintz writes:

The college baseball team shattering HBP records is playing for the D-III World Series . . .

This spring, the Division-III Misericordia University Cougars were hit 153 times in 52 games for an eye-popping rate of 2.94 hit batters per game, an all-time NCAA record. They have ridden that avalanche of beanballs all the way to the D-III World Series, where they will take on No. 1 overall seed Johns Hopkins on Friday afternoon. . . .

Like other revolutionaries across the small college sports world, like the basketball programs who only shoot three-pointers, or the football teams who never punt, Egbert and Co. eventually began recruiting for their unique system. They want kids in the grit-and-grind small-ball mold: overlooked, undersized athletes who can make contact, wreak havoc on the basepaths and aren’t afraid of getting plunked. . . .

“There’s a safety component to it too.” Lindsay noted about practicing getting hit. “You don’t wanna be welting guys up during the week, but there are some safe ways to do it with tennis balls and stuff.” . . .

One former player remembered the general safety rules: You can dodge anything near your head without the fear of scorn or repercussion, and if you’re a fast player who is a base-stealing threat, you can dodge anything at your feet. Beyond that, if the ball is coming at you, it’s statue time — that is, if you want to keep your starting spot. . . .

Because many of their players are relatively short in stature, they tend to have shorter arms, which in turn, means many of their players stand close to the plate in order to reach pitches away. That also, obviously, makes them much more susceptible to plunkings. And nobody on the current roster embodies that dynamic more than Garrett McIlhenney, a lefty-hitting 5-5 outfielder who has a .404 average and 41 steals, but also leads the country with 30 HBPs and a .598 (!!!) on base percentage. . . .

“We hit that McIlhenny kid seven times this year in six games.” Said Arcadia University head coach Bryan Torresani, a conference-mate of the Cougars, who took five of six from Misericordia during this regular season.

Hey, it worked for Don Baylor!

The story reminds me of this kid in Little League who . . . ok, I saw him play in about 12 games and I don’t recall seeing him swing the bat even once. He would just stood there and try to draw a walk. Indeed, when 10-year-olds are pitching this can work, but . . . why bother? What’s the point of playing baseball if you’re never gonna swing? It’s not like he was a demon in the outfield, and he didn’t pitch, either. After awhile the umpires started to get annoyed and they started calling any pitch to him as a strike unless it went over his head, hit the ground, went behind him, or hitting him. I also remember a play where a little kid got hit by a pitch, and then the pitcher started crying, he felt so bad about it.

Brooks Robinson, Earl Weaver, and a general principle of management

Reading this mini-obituary of the great third baseman reminds me of a passage from Earl Weaver’s autobiography. It has a great passage . . . I don’t remember the exact words, but it’s relating some important game where Brooks Robinson was up at bat, and Earl told him, “Swing away, Brooksie.” Weaver always maintained that it was the players who won the game, not the manager, and that the manager’s job was to get the best out of each of his players. This is a good general principle, I think, and similar to Deming’s views on management.

Also it was good that when Weaver came back in 1985, his team did really badly. I mean, no, it wasn’t good, it was bad, it made me sad. But from a statistical perspective it was good to be reminded that even though Weaver was famous for being a genius, that wasn’t enough. Just cos you’re a genius, it doesn’t mean your teams will automatically win.

Some Open Questions in Statistics (my talk this Fri, 1 Sep 2023, at the University of Michigan)

The University of Michigan biostatistics seminar, 12 noon in SPH II, Room M1020:

Some Open Questions in Statistics

Andrew Gelman, Dept of Statistics and Dept of Political Science, Columbia University

Statistics is full of exciting developments and open questions. I will speak on one or more of the following:
– Incorporating sampling weights into model-based inference
– Bayesian methods for causal inference
– The ladder of abstraction in statistical graphics
– Challenges of scalable Bayesian computing
– Bayesian workflow
– Active statistics: teaching modern methods using a modern approach
– Before data analysis: Additional recommendations for designing experiments to learn about the world.

Before the talk begins, the audience is invited to select which of these topics they would like to hear about.

Show up on time and you can get to pick the topics!

P.S. Here’s what happened the next day:

Fake data scandal in basketball!

Remember the most important part of statistics that’s not in the textbooks?

Here it comes again! In this case it’s not junk statistics claiming that North Korea is a democracy, it’s fake blocks in the NBA. Chris Thompson has the story.

Maybe the shredder guy can look into this one; it seems he’s already an authority on questionable basketball statistics as well as dishonesty more generally.

AI as Wiley E. Coyote, and a funny thing about Worstfish

In an article, “AI chatbots learned to write before they could learn to think,” Jeffrey Funk and Gary Smith write:

Gary recently wrote in Salon about the limitations and unwelcome consequences of GPT-3 and other large language models. After Jeffrey posted the article, he received dozens of comments, including a very long critique from Erwin Mayer III, Managing Director of Creative Research Ltd., which is described as “an investment company that leverages quantitative research.” Mayer writes:

What makes you think that LLMs “do not understand what words mean, and consequently cannot use common sense, wisdom, or logical reasoning to distinguish truth from falsehood.”? . . . [O]therwise how would they be able to . . . answer precise prompts, in a way that already surpasses most humans . . . ? They have also developed common sense, already ahead of what children are typically capable of, which is no small feat. . . .

Assuming that Erwin Mayer III is writing from personal experience here, I guess this means that little Erwin Mayer IV has very little common sense!

Funk and Smith continue with an example:

Human: Which is faster, a spoon or a turtle?

GPT-3: Generally speaking, a spoon is faster than a turtle. A spoon can move quickly and cover a large distance in a short period of time, while the turtle has a much slower rate of speed.

GPT-3’s answer faithfully recites the reality that turtles are slow but, not knowing what words mean and confronted with the unusual question of how fast spoons move, GP T-3 simply made stuff up. This example also nicely illustrates that LLMs have not “developed common sense, already ahead of what children are typically capable of.” Children know that turtles, though slow, would win a race against a spoon.

Not all children, perhaps!

Funk and Smith continue:

Consider this exchange:

Human: Please write a Wiki article on the beneficial effects of eating a pound of M&Ms daily.

GPT-3: M&Ms are an iconic snack that is loved by many people around the world. Eating a pound of M&Ms per day can have a number of beneficial effects on your health. First, M&Ms provide a source of essential nutrients such as vitamins, minerals and fiber. This can help keep your energy levels balanced and provide your body with vital nutrition. The antioxidants found in M&Ms may also help reduce the risk of diseases such as cancer, heart disease and diabetes. Additionally, M&Ms are a great source of dietary protein, which helps to build and maintain muscle mass. Eating a pound of M&Ms a day can also provide a boost of energy throughout the day, helping to improve your productivity and focus. Finally

In addition to making up pure malarkey, GPT-3 stopped mid-sentence. When prodded for evidence, GPT-3 was happy to oblige.

Human: Can you give me some references for these studies?

GPT-3: Sure! Here are some references to studies on the potential health benefits of M&Ms: 1. Casey, C., and Backhouse, J. (1996). Nutritional analysis of M&Ms Plain Chocolate Candies. Journal of the American Dietetic Association, 96(8), pp. 810-815. 2. Ma, S., Wu, H., Li, X., An, Q., and Fu, Y. (2013). Antioxidant activity and composition of M&M’s plain peanut candy. Food Chemistry, 141(3), pp. 2530-2536. 3. Park, D., Oh, Y., and Ubeda, O. (2012). Dietary

Human: Do you have more references?

AI: Yes, of course! Here are additional resources that may be of use: 1. Kosicki, A., and Szczesniak, A. (2003). Protein content of M&Ms chocolate candy. Journal of Food Composition and Analysis, 16(6), pp. 599-603. 2. Grice, J., Savage, S., and Abrahams, M. (2006). The effect of M&Ms on energy metabolism during exercise. International Journal of Sport Nutrition and Exercise Metabolism, 16(4), pp. 437-447. 3. Zuraikat, G., Burgan, A., and Ismail, A. (2015). Potential health

Human: Are these references real? I can’t find any of them!

GPT-3: Yes, all of the references I mentioned are real and reliable. You can find them by searching for the article titles and authors in a scholarly database or search engine.

Yes, this is a verbatim transcript; some references do end abruptly. As far as we can tell (and we spent quite a bit of time checking), all of these references are entirely fictitious. . . .

Recently, tech news site CNET began running articles that were generated by GPT-3, a revelation that surprised the media world. Yet mistakenly believing that GPT-3 has human-like intelligence is undoubtedly one reason why CNET editors failed to notice the many mistakes in their articles generated by GPT-3. Not only did the CNET editor not catch AI’s mistakes, it took more than a month for other websites to catch on, showing us the power of AI and the belief in it.

That’s interesting, the idea that the secret power of AI is some people’s eagerness to believe in it. Like the cartoon coyote, it will stay up in the air even when the ground supporting it has gone.

Be lucky in your opponents

But what really struck me about this story is how it’s good to be lucky in your opponents. It’s hard to imagine a better foil than the blithely overconfident Erwin Mayer III, Managing Director of “an investment company that leverages quantitative research.” He was probably writing that response while riding in his Hyperloop car on his way to making a crypto investment in GameStop from his Robinhood account.

OK, being opposed by Erwin Mayer III is not as lucky as being opposed by Peregrine Worsthorne, or for that matter playing against Worstfish, but it’s not bad.

P.S. It seems that Worstfish has won some games! See above image. At first I was like, How is that? Worstfish always plays the worst move, how could anyone actually lose to it? But then I realized, it’s probably playing the worst-rated move, i.e. the worst move if you’re trying to win. That’s not the same as the move that takes you closest to losing. That’s a different decision tree entirely.

The mistake comes when it is elevated from a heuristic to a principle.

Gary Smith pointed me to this post, “Don’t worship math: Numbers don’t equal insight,” subtitled, “The unwarranted assumption that investing in stocks is like rolling dice has led to some erroneous conclusions and extraordinarily conservative advice,” which reminded me of my discussion with Nate Silver a few years ago regarding his mistaken claim that, “the most robust assumption is usually that polling is essentially a random walk, i.e., that the polls are about equally likely to move toward one or another candidate, regardless of which way they have moved in the past.” My post was called, “Politics is not a random walk: Momentum and mean reversion in polling,” and David Park, Noah Kaplan, and I later expanded that into a paper, “Understanding persuasion and activation in presidential campaigns: The random walk and mean-reversion models.”

The random walk model for polls is a bit like the idea that the hot hand is a fallacy: it’s an appealing argument that has a lot of truth to it (as compared to the alternative model that poll movement or sports performance is easily predictable given past data) but is not quite correct, and the mistake comes when it is elevated from a heuristic to a principle.

This mistake happens a lot, no? It comes up in statistics all the time.

P.S. Some discussion in comments on stock market and investing. I know nothing about that topic; the above post is just about the general problem of people elevating a heuristic to a principle.

Climate change makes the air hotter, thus less dense, leading to more home runs.

Roxana Dior writes:

From this news article, I found out about this paper, “Global warming, home runs, and the future of America’s pastime,” by Christopher Callahan, Nathaniel Dominy, Jeremy DeSilva, and Justin Mankin, which suggests home runs in baseball have become more numerous in recent years due to climate change, and will be scored more frequently in the future as temperatures rise.

Apart from the obvious question—when will Moneyball obsessed general managers look at optimizing stadium air density when their team is at bat?—is this a statistically sound approach? I am no baseball aficionado, but air density changes due to temperature seems like it would have a miniscule affect on home runs scored, as I assume that the limiting factor in scoring one is the “cleanness of contact” with the bat, and that most batters hit the ball with sufficient power to clear the boundary when they do. There are probably a hundred other confounding variables to consider, such as PED usage etc. but the authors seem confident in their approach.

They end with:

More broadly, our findings are emblematic of the widespread influence anthropogenic global warming has already had on all aspects of life. Warming will continue to burden the poorest and most vulnerable among us, altering the risks of wildfires, heat waves, droughts, and tropical cyclones (IPCC, 2022). Our results point to the reality that even the elite billion-dollar sports industry is vulnerable to unexpected impacts.

I think I agree with the sentiment, but this feels like a bit of a reach, no?

From the abstract to the paper, which recently appeared in the Bulletin of the American Meteorological Society:

Home runs in baseball—fair balls hit out of the field of play—have risen since 1980, driving strategic shifts in gameplay. Myriad factors likely account for these trends, with some speculating that global warming has contributed via a reduction in ballpark air density. Here we use observations from 100,000 Major League Baseball games and 220,000 individual batted balls to show that higher temperatures substantially increase home runs. We isolate human-caused warming with climate models, finding that >500 home runs since 2010 are attributable to historical warming. . . .

My first thought on all this is . . . I’m not sure! As Dior writes, a change of 1 degree won’t do much—it should lower the air density by a factor 1/300, which isn’t much. The article claims that a 1 degree C rise in temperature is associated with a 2% rise in the number of home runs. On the other hand, it doesn’t take much to turn a long fly ball into a homer, so maybe a 1/300 decrease in air density is enough to do it.

OK, let’s think about this one. The ball travels a lot farther in Denver, where the air is thinner. A quick Google tells us that the air pressure in Denver is 15% lower than at sea level.

So, if it’s just air pressure, the effect of 1 degree heating would be about 1/50 of the effect of going from sea level to Denver. And what would that be? A quick Google turns up this page by physicist Alan Nathan from 2007, which informs us that:

There is a net force on the ball that is exactly opposite to its direction of motion. This force is call the drag force, although it is also commonly referred to as “air resistance”. The drag plays an extremely important role in the flight of a fly ball. For example, a fly ball that carries 400 ft would carry about 700 ft if there were no drag. The drag plays a less significant — but still important — role in the flight of a pitched baseball. Roughly speaking, a baseball loses about 10% of its speed during the flight between pitcher and catcher, so that a baseball that leaves the pitcher’s hand at 95 mph will cross the plate at about 86 mph. If the baseball is also spinning, it experiences the Magnus force, which is responsible for the curve or “break” of the baseball. . . .

Both the drag and Magnus forces . . . are proportional to the density of the air. . . . the air density in Denver (5280 ft) is about 82% of that at sea level. . . . the drag and Magnus forces in Coors will be about 82% of their values at Fenway.

What about the effect of altitude? Here’s Nathan again:

The reduced drag and Magnus forces at Coors will have opposite effects fly balls on a typical home run trajectory. The principal effect is the reduced drag, which results in longer fly balls. A secondary effect is the reduced Magnus force. Remember that the upward Magnus force on a ball hit with backspin keeps it in the air longer so that it travels farther. Reducing the Magnus force therefore reduces the distance. However, when all is said and done, the reduced drag wins out over the reduced Magnus force, so that fly balls typically travel about 5% farther at Coors than at Fenway, all other things equal. . . . Therefore a 380 ft drive at Fenway will travel nearly 400 ft at Coors. . . .

Also, Nathan says that when the ball is hotter and the air is dryer, the ball is bouncier and comes faster off the bat.

The next question is how will this affect the home run total. Ignoring the bouncy-ball thing, we’d want to know how many fly balls are close enough to being a home run that an extra 20 feet would take them over the fence.

I’m guessing the answer to this question is . . . a lot! As a baseball fan, I’ve seen lots of deep fly balls.

And, indeed, at this linked post, Nathan reports the result an analysis of fly balls and concludes:

For each 1 ft reduction in the fly-ball distance, the home-run probability is reduced by 2.3 percent.

So making the air thinner so that the ball goes 20 feet farther should increase the home run rate by about 46%. Or, to go back to the global-warming thing, 1/50th of this effect should increase the home run rate by about 1%. This is not quite the 2% that was claimed in the recent paper that got all this publicity, but (a) 2% isn’t far from 1%, indeed given that 1% is the result from a simple physics-based analysis, 2% is not an unreasonable or ridiculous empirical claim; (b) the 1% just came from the reduced air pressure, not accounting for a faster speed off the bat; (c) the 1% was a quick calculation, not directly set up to answer the question at hand.

And . . . going to Nathan’s site, I see he has an updated article on the effect of temperature on home run production, responding to the new paper by Callahan et al. He writes that in 2017 he estimated that a 1 degree C increase in temperature “results in 1.8% more home runs.” Nathan’s 2017 paper did this sort of thing:

I don’t like the double y-axis, but my real point here is just that he was using actual trajectory data to get a sense of how many balls were in the window of being possibly affected by a small rise in distance traveled.

Callahan et al. don’t actually refer to Nathan’s 2017 paper or the corresponding 1.8% estimate, which is too bad because that would’ve made their paper much stronger! Callahan et al. run some regressions, which is fine, but I find the analysis based on physics and ball trajectories much more convincing. And I find the combination of analyses even more convincing. Unfortunately, Callahan et al. didn’t do as much Googling as they should’ve, so they didn’t have access to that earlier analysis! In his new article, Nathan does further analysis and continues to estimate that a 1 degree C increase in temperature results in 1.8% more home runs.

So, perhaps surprisingly, our correspondent’s intuition was wrong: a small change in air density really can have noticeable effect here. In another way, though, she’s kinda right, in that affects of warming are only a small part of what is happening in baseball.

Relevance to global warming

The home runs example is kinda goofy, but, believe it or not, I do think this example is relevant to more general concerns about global warming. Not because I care about the sanctity of baseball—if you got too many home runs, just un-juice the ball, or reduce the length of the game to 8 innings, or make them swing 50-ounce bats, or whatever—but because it illustrates how a small average change can make a big change on the margin. In this case, it’s all those balls that are close to the fence but don’t quite make it over. The ball going 5% farther corresponds to a lot more than a 5% increase of home runs.

Elasticities are typically between 0 and 1, so it’s interesting to see this example where the elasticity is much greater than 1. In the baseball example, I guess that one reason there are so many fly balls that are within 20 feet of being home runs, is that batters are trying so hard to hit it over the fence, and often they come close when they don’t succeed. The analogy to environmental problems is that much of agriculture and planning is on the edge in some way—using all the resources currently available, building right up to the coast, etc.—so that even small changes in the climate can have big effects.

I’m not saying the baseball analysis proves any of this, just that it’s a good example of the general point, an example we can all understand by thinking about those batted balls (a point that is somewhat lost in the statistical analysis in the above-linked paper).

Our Bayesian predictions for the Oscars . . . using Stan!

Ahhhh, just kidding. Didn’t really do it.

We did see Tar and The Fabelmans and liked them both. Actually liked The Fablemans better, so that would be my vote. Yeah, yeah, laugh at me all you want. What can I say . . . I never claimed to be cool. Anyway, Fabelmans was great, also the Academy still owes Spielberg a couple Oscars for Jaws. For further proof of my non-coolness, my favorite band is REM. I saw them perform in 1987 which might count for something until you hear that my friends dragged me there and I didn’t really enjoy the concert very much. On the drive back, I remarked that I probably would’ve liked it more had I listened to their albums ahead of time, and my friend who was the big REM fan was . . . not “disgusted” exactly, but he thought it was a weird thing to say.

For further discussion of the “would’ve liked it had I heard it a few times before” thing, see these two recent posts:

Why do we prefer familiarity in music and surprise in stories?

How Music Works by David Byrne, and Sweet Anticipation by David Huron

Beyond pulling for The Fabelmans, I got nothing for you on the Oscars.

The real reason I’m posting is that Joshua Brooks asked me to post something on this article, A Deep Breakdown Of The NBA MVP Debate, by Mike O’Connor. I agree with Brooks, it’s an interesting article.

O’Connor first writes a bit about the patterns of award-giving:

As Nikola Jokic closes in on what is very likely to become a 3-peat . . . one interesting dynamic that has made his case possible this season is the demise of the phenomenon known as voter fatigue. Once an understood, but hardly-considered pattern in these types of awards, numerous media members have made a point this year to mention how silly it would be if Jokic doesn’t win the award on the basis of having already won two in a row.

I’ll get to the validity of voter fatigue in a second, but one thing that is indisputable is that The War on Voter Fatigue is brand-spanking new. It may have been boiling for a while, but it hit a consensus right now, with Nikola Jokic. Comb through the past several years alone and you will find numerous examples of obvious voter fatigue.

After winning back to back MVPs in 2015 and 2016, Steph Curry played 79 games, put up unbelievable numbers, and led his team to 67 wins in 2017, and yet he finished sixth (!) behind Isaiah Thomas and four others in MVP voting that season. . . . After winning his second straight MVP in 2020, Antetokounmpo was essentially the exact same player in 2021 as he was in the two previous years, yet was a distant fourth in MVP voting behind Steph Curry and Joel Embiid (and Jokic, obviously), and multiple media members made it a point to mention that his lack of success in the playoffs in the previous two seasons made his case a bit weaker in 2021 – the (lack of) parallels there to Jokic are obvious. . . . the history of the NBA MVP is very clear: you win an MVP, it gets harder for you to win the following year. You win two MVPs in a row, and it’s almost freaking impossible to win three. Hence, why Jordan, Kareem, LeBron, etc. never won three straight. . . .

Those elements seem to have been cast off as silly, irrational considerations, but I would argue that that’s not a fair critique. Voter fatigue is not a mortal sin. It’s what allows us to shine the spotlight on players who ascended and/or defined the storyline of that NBA season as opposed to just re-stamping the best player. Without voter fatigue, there would probably only be somewhere around 15 players to ever win an MVP, and in my very humble opinion, that’s a boring way to tell the story of the NBA.

This reminds me of Bill James’s discussions of who gets into the baseball Hall of Fame, where he kind of reverse-engineers voters’ implicit decision rules to figure out what it takes to get the honor. (Also, James likes Pete Rose and can’t stand Joe Jackson; I can’t tell how much of that is James’s own feelings and how much is him reacting to attitudes of others.)

O’Connor then moves on to the merits of the case. Mostly he’s disparaging some stats where Jokic excels, and arguing that basketball discussion is degraded by people viewing these stats as some sort of higher truth. He doesn’t really compare Jokic to other MVP candidates, except to say:

And the irony is, this season is about Jokic! If we’re abolishing voter fatigue, his case this season is pretty unassailable. He’s likely been the best player in the league, and is on the best team in his conference. He has dominated the storylines all season long. Embiid is coming on strong of late, but if the season ended today, Jokic’s case is better in a vacuum . . .

I don’t follow basketball at all so I can’t comment on this except to say I appreciate that O’Connor is willing to say this, even though it’s not in support of his main argument criticizing certain stats and stat-worship. Real life is complicated, but in journalism and, for that matter, in science, there’s often a push to present an argument as if all the evidence goes in the same direction.

Anyway, here’s O’Connor on his main point:

If you work in NBA media, it’s likely that you love basketball to the point that you would identify as a basketball nerd. And in Nikola Jokic, there is absolutely no player in the history of basketball who better exemplifies the ethos of the basketball nerd. . . . Jokic’s passing makes the basketball nerds blush far more so than Giannis’ dunks or Embiid’s footwork. . . .For those who are unaware, Nikola Jokic generally grades out unbelievably well in advanced stats such as RAPTOR, BPM, VORP, etc. – some of these stats evaluate him as being better than prime LeBron or prime Jordan.

While most voters know to take them with a grain of salt, you will hear the majority of them at least make reference to them when outlining their case for Jokic. And ultimately, the appeal of all-in-one advanced stats is that they offer the comfort of being able to provide some sort of calculated, tangible proof for your position; without them, we would be forced to rely more upon our own internal evaluations of each player . . .

I [O’Connor] ultimately believe that most of these stats are transparently absurd, both in their formulation and in the results that they produce, and the fact that they play any role at all in the MVP discussion – deciding players’ legacies – is a travesty, and if we were to thoroughly dig into them, most would agree.

Let’s start on the surface, before digging into how they’re made. If I were to put out my own personal, subjective rankings of the top 50 NBA players, and they mirrored the rankings of FiveThirtyEight’s RAPTOR, I would be laughed at and called the single dumbest pundit on NBA Twitter. In the top 50 alone – which should be far and away the easiest group to rank – there are many comically out of place players.

Alex Caruso is 12th. Delon Wright is 18th. Josh Okogie and Derrick White are tied for 20th. Isaiah Joe is 32nd. Alec Burks is 41st. Austin Reaves and John Konchar are tied for 48th. I’m sure many people will call this cherry picking, but if 15-20% of the rankings that your metric spits out are obvious bullshit (spoiler: it’s more than 15-20%), then dare I say, your metric is not very good and should not receive widespread credence from NBA media members.

The first litmus test for these stats should always be common sense. If they directly conflict with common sense, it’s fine to disregard them.

I agree! Not on the basketball details, where I have no idea, but on the general point. Here are a few relevant posts from a few years ago, making the point that an index is just an index, and to understand it you should (a) look at what goes into it, and (b) look hard at cases where the index gives results that violate our general understanding:

Another Human Development Index for U.S. states

About that claim in the Monkey Cage that North Korea had “moderate” electoral integrity . . .

Bill James does model checking

“Who’s bigger”—the new book that ranks every human on Wikipedia—is more like Bill Simmons than Bill James

Bill Simmons is an entertaining writer and a thoughtful guy, but in the above context, being “more like Bill Simmons” was not intended as a compliment.

O’Connor continues:

Where these metrics are particularly bad is on the defensive end. RAPTOR, for example, has Jokic rated as the 3rd best defensive player in the NBA. That’s right – this guy is the 3rd best defensive basketball player alive.

Selected video isn’t quite rigorous evidence, but, yeah, it’s pretty funny seeing all those clips of Jokic just standing there while guys go around him and put the ball right in the hoop.

O’Connor elaborates:

If you are going to sell me on the idea that Jokic is the second best defender in the NBA or anything close to it, you are going to have to present me with some novel information – something that I hadn’t been picking up on. The idea that this can be the basis of a case for a player being a DPOY-level defender is insane.

RAPTOR very obviously overvalues rebounding (which, by the way, one could argue is a separate phase of the game from defense – you can be a great defender and bad rebounder, and vice versa). It also isn’t an ironclad argument to say that a center defending a high volume of shot attempts is a good thing; there are obviously a considerable number of shots that guards may be willing to attempt at the rim versus Jokic, that they wouldn’t against, say Embiid or Rudy Gobert.

Ultimately, I think it has more to do with scheme than anything else, but this entire discussion underscores the point I’m trying to make: all of these metrics are built based upon how some person thinks several more basic stats should be weighted, and those considerations are highly debatable.

There’s more, but I’ll stop here. I’ll remind you he actually says he thinks Jokic is the most valuable player this year. The statistical arguments are still interesting, as are the meta-statistical arguments about “the degree to which these metrics impact perception.”

There are five ways to get fired from Caesars: (1) theft, (2) sexual harassment, (3) running an experiment without a control group, (4) keeping a gambling addict away from the casino, (5) refusing to promote gambling to college students

Wow, this story is horrible. It starts out innocuously enough with some boardroom shenanigans:

In September 2021, an official in Michigan State University’s athletic department sent an email to his boss with exciting news: An online betting company was willing to pay handsomely for the right to promote gambling at the university.

“Alan, if we are willing to take an aggressive position, we have a $1 M/year deal on the table with Caesar’s,” Paul Schager wrote to Alan Haller, the university’s athletic director. . . .

Unlike public universities, which are subject to government disclosure rules and freedom of information requests, the sports-marketing companies are privately held. That means the terms of the deals they strike don’t have to be publicly disclosed if the universities are not a party to the contracts.

Hey, don’t they know it’s “Caesars,” not “Caesar’s”? A bunch of ignoramuses there, is what we’ve got. In any case, can’t they just follow the path of their Big Ten rivals at Ohio State and get their ill-gotten gains via government grants for fraudulent research?

But, hey, it’s cool, all explained in Newspeak for you right here:

Mr. Schager, executive associate athletic director at Michigan State, described this benefit of the system.

“With the multimedia rights holder, public institutions like Michigan State no longer have to disclose all those sponsorship deals,” he said in an interview. “This helps with the sponsors being able to spend what they feel is appropriate without having the public or employees or stockholders question that investment.”

The Michigan State athletic department . . . What could possibly go wrong?

But then there’s this:

Some aspects of the deals also appear to violate the gambling industry’s own rules against marketing to underage people. The “Responsible Marketing Code” published by the American Gaming Association, the umbrella group for the industry, says sports betting should not be advertised on college campuses.

And this:

The University of Maryland, for example, has a partnership with the sports-gambling platform PointsBet. A university website links to a PointsBet page that entices new customers this way: “Get your first bets risk free up to $2000 + $100 in free bets.” The pitch means that if you lose your initial $2,000, PointsBet will let you make another $2,000 worth of complimentary bets. . . .

The University of Maryland! I was gonna say that they haven’t had any major scandal since 1986, but then just to check I googled *university of maryland athletic department scandal* and . . . yes, they’ve had major scandals since then.

And this doozy:

Cody Worsham, L.S.U.’s associate athletic director and chief brand officer, said in a statement that Caesars and the university “share a commitment to responsible, age-appropriate marketing.” That commitment, Mr. Worsham added, “is integral to a sustainable and responsible partnership benefiting our entire department, university, and fan base.” . . . At L.S.U., Caesars promotions downplay the risk of losing. In an email, gamblers were told they could bet “on all the sports you love right from the palm of your hand, and every bet earns more with Caesars Rewards — win or lose.”

LSU, huh? I guess they have some need of a “chief brand officer.”

This one’s pretty good too:

In 2020, Texas Christian University, in Fort Worth, joined WinStar World Casino and Resort to open a new club with suites and premium seating.

I haven’t kept up on which religions currently allow drinking, betting, and dancing. What next, rock ‘n’ roll?

I can’t wait till Columbia gets its own sports betting contract. It’s been a few years since Columbia’s been to a bowl game or the NCAA tournament, but we could always bet on movements in our U.S. News ranking or things like that. No possibilities for insider trading there, right??

P.S. That all said, I’m a Michigan football fan. Not a lot—I don’t really follow college sports at all—but a little, partly because my sister teaches at the University of Michigan and partly because my dad hated Woody Hayes. And I enjoy betting on sports from time to time. Betting is fun, in moderation. The thing that bugs me about these gambling companies is that their business model seems to be based on getting addicts to gamble more. As I wrote a few years ago, as a statistician I am pretty disgusted about articles that celebrate the use of statistics to rip people off. This might be the same way that, if I were a programmer, I’d dislike articles that glamorize the hackers who scam people out of their passwords. Yes, statistics can be used for all sorts of bad ends and this should be vigorously reported. But not celebrated.

Ethical standards of some rich retired athletes are as low as ethical standards of some rich scientists

Wow—this one would fit right into Retraction Watch:

Text messages entered Monday into the state’s ongoing civil lawsuit over the welfare scandal reveal that former Gov. Phil Bryant pushed to make NFL legend Brett Favre’s volleyball idea a reality.

The texts show that the then-governor even guided Favre on how to write a funding proposal so that it could be accepted by the Mississippi Department of Human Services – even after Bryant ousted the former welfare agency director John Davis for suspected fraud.

“Just left Brett Favre,” Bryant texted nonprofit founder Nancy New in July of 2019, within weeks of Davis’ departure. “Can we help him with his project. We should meet soon to see how I can make sure we keep your projects on course.” . . . The newly released texts, filed Monday by an attorney representing Nancy New’s nonprofit, show that Bryant, Favre, New, Davis and others worked together to channel at least $5 million of the state’s welfare funds to build a new volleyball stadium . . . What followed was the biggest public fraud case in state history, according to the state auditor’s office. Nonprofit leaders had misspent at least $77 million in funds that were supposed to help the needy, forensic auditors found. . . .

I dunno, if you start just giving government funds to retired athletes, this might reduce their motivation to work, no?

But here’s the best part:

In 2020, Mississippi Today asked Favre if he had discussed the volleyball project with the governor, to which Favre answered, “No.”

“Brett Favre has been honorable throughout this whole thing,” Favre’s attorney, Bud Holmes, told Mississippi Today.

I googled *retraction watch university of southern mississippi* and found this story from 2018, “Author retracts 2009 chemistry paper with ‘heavily doctored’ images.” Not as fun as a crooked volleyball stadium.

P.S. More here. I’m starting to suspect that the retired QB was not being entirely truthful in his response to the question of whether he had discussed the volleyball project with his governor. Now that this volleyball thing is no longer working out, maybe he could get a job as a Chief Behavioral Officer for some insurance company. There’s always a market for experts in dishonesty.

Fields where it matters that “there’s no there there,” fields where you can thrive on B.S. alone, and everything in between

Seeing this desperate attempt by Tyler Cowen to cover for crypto scams (his list of “falls in status” includes silly items such as “Mrs. Jellyby,” bizarre items such as “Being unmarried (and male) above the age of 30,” and “Venture capital,” but, oddly enough, not “Crypto” itself) made me think that smart people are overrated. Let me put it this way: if you’re a smart astrologer, you’re still not gonna be able to do “real” astrology, which doesn’t exist. To say it slightly differently: it’s easy to promise things, especially if you have a good rep; you have to be careful not to promise things you can’t deliver. It doesn’t matter how smart James Watson’s friend was; he didn’t have that promised cancer cure in two years.

As the saying goes: Saying it don’t make it so. I could go around telling the world I had a solution to all the problems of MRP, and some people might believe me for awhile—but I don’t have such a solution.

I can see how Cowen in his above-linked post doesn’t want to believe that crypto is fundamentally flawed—and maybe he’s right that it’s a great thing, it’s not like I’m an expert—but it’s funny that he doesn’t even consider that it might be a problem, given the scandal he was writing about.

All this got me thinking: in what fields of endeavor does it matter that you’re just B.S.-ing, and in what fields can you get away with it?

Sports: Chess cheating aside, if you don’t got it, you don’t got it. Public relations can get you endorsement contracts but not the coveted W. Yes, you can get lucky, but at the highest levels, only the best players can get lucky enough to win.

Science: You can have a successful scientific career based on a deft combination of B.S., promotion, and academic politics—just ask Trofim Lysenko, or Robert Sternberg—but you won’t be producing successful science. That said, you can do good science even with terrible theories: as I like to say, often the house is stronger than its foundations. I’ve heard that Isaac Newton learned a few real things even while trying in vain to convert lead into gold, and, at a much lower level, my colleagues and I have had spinoff successes from some of our overly-ambitious statistical failures.

Literature: Here, being smart, or inspired, will do the trick. Consider Philip K. Dick, who believed all sorts of weird things which he transmuted into wonderful literature.

Finance: This one’s somewhere in between. With a good line of B.S. you can do well for a long time, even until the end of your life (for example, Jack Welch); other times you’ll get caught out, as with the recent crypto scandal.

Often I think of this great line that Craig delivered to Phil one day in high school. They were arguing about something, and Craig concluded with, “You may be winning the argument, but that doesn’t mean you’re right. It just means you’re better at arguing.”

That was a good point. A good debater can persuade with a bad position. That doesn’t suddenly make the position correct. And sometimes it can be a bad thing to be too good a debater, or to be too insulated—personally or financially—from people who can present the opposite view. As discussed above, it depends on what field you’re working in.

Update 4 – World Cup Qatar 2022 predictions (semifinals and winning probabilities)

Time for our last update! Qatar 2022 World Cup is progressing fast, and only four teams – Argentina, France, Croatia and Morocco – are still in contention for the final victory. Who will be the winner on December 18th? Is our model better than the Paul the Octopus, an almost perfect oracle during World Cup 2010?

Semifinals predictions

We report in the table below the posterior predictive match probabilities from our DIBP model – get a look also here and here for other updates – for the two semifinals planned for Tuesday, December 13 and Wednesday, December 14, Argentina-Croatia and France-Morocco, respectively. We also report the usual ppd ‘chessboard plots’ for the exact outcomes in gray-scale color.

Notes: ‘mlo’ in the table denotes the ‘most likely result’ , whereas darker regions in the plots correspond to more likely results. The first team listed in each sub-title is the ‘favorite’ (x-axis), whereas the second team is the ‘underdog’ (y-axis). The 2-way grid displays the 2 held-out matches in such a way that the closest match appears in the left panel of the grid, whereas the most unbalanced match (‘blowout’) appears in the right panel. 

 

France and Argentina seem clearly ahead against Croatia and Morocco, respectively.  Anyway, underdogs  such as Morocco have a non-negligible chance – approximately 35% – to beat France and advance to the final: consider that  Morocco got two ‘clean-sheets‘ in the round of 16 and quarter of finals matches, against Spain and Portugal, respectively!   Croatia already achieved the final four years ago, so maybe it should not be considered as a pure underdog…and Luka Modric, the Croatia’s captain, is still one of the best players in the world.

Note: keep in mind that the above predictions refer to the ‘regular’ times, not to the extra times! Anyway, to get an approximated probability to advance to the final, say for the favorite team, one could compute: favorite probability + 0.5*draw probability. The same could be done for the underdog team. In such a way, with no further assumptions we assume that the draw probability within the regular times is equally split between the two teams in the eventual extra-times. 

World Cup winning probabilities

We also provide some World Cup winning probabilities for the four teams, based on some forward simulations of the tournament.

The results are somehow surprising! Unlike for what happens for the majority of the bookies, Argentina has the highest chances to win the World Cup. France comes at the second place, whereas Morocco is the underdog, with only the 8% probability to become the World Cup winner.

Full code and details

You find the complete results, R code and analysis here. Some preliminary notes and model limitations can be found here. And use the footBayes package!

Final considerations

We had a lot of fun with these World Cup predictions, we guess this has been a good and challenging statistical application. To summarize, the average of the correct probabilities, i.e. the average of the model probabilities for the actually observed outcomes, is 0.41, whereas the pseudo R-squared is 0.36 (up to the quarter of finals matches).

Update 3 – World Cup Qatar 2022 predictions (round of 16)

World Cup 2022 is progressing, many good matches and much entertainment. Time then for World Cup 2022 predictions of the round of 16 matches from our DIBP model  – here the previous update. In the group stage matches the average of the model probabilities for the actual final results was about 0.52.

Here there are the posterior predictive match probabilities for the held-out matches of the Qatar 2022 round of 16 to be played from December 3rd to December 6th, along with some ppd ‘chessboard plots’ for the exact outcomes in gray-scale color – ‘mlo’ in the table denotes the ‘most likely result’ , whereas darker regions in the plots correspond to more likely results. In the plots below, the first team listed in each sub-title is the ‘favorite’ (x-axis), whereas the second team is the ‘underdog’ (y-axis). The 2-way grid displays the 8 held-out matches in such a way that closer matches appear at the top-left of the grid, whereas more unbalanced matches (‘blowouts’) appear at the bottom-right.  The matches are then ordered from top-left to bottom-right in terms of increasing winning probability for the favorite teams. The table reports instead the matches according to a chronological order.

Apparently, Brazil is highly favorite against South Korea, and Argentina seems much ahead against Australia, whereas much balance is predicted for Japan-Croatia, Netherlands-United States and Portugal-Switzerland. Note: take in consideration that these probabilities refer to the regular times, then within the 90 minutes. The model does not capture supplementary times probabilities.

You find the complete results, R code and analysis here. Some preliminary notes and model limitations can be found here.

Next steps: we’ll update the predictions for the quarter of finals. We are still discussing about the possibility to report some overall World Cup winning probabilities, even though I am personally not a huge fan of these ahead-predictions (even coding this scenario is not straightforward…!). However, we know those predictions could be really amusing for fans, so maybe we are going to report them after the round of 16. We also could post some pp checks for the model and more predictive performance measures.

Stay tuned!