Big news out of Europe, everyone’s talking about soccer.

Leo Egidi updated his model and now has predictions for the Round of 16:

Here’s Leo’s report, and here’s his zipfile with data and Stan code.

The report contains some ugly histograms showing the predictive distributions of goals to be scored in each game. The R histogram function FAILS with discrete data because it puts the bin boundaries at 0, 1, 2, etc. Or, in this case, 0, .5, 1, 1.5, etc., which is even worse because now the y-axis is hard to interpret as the frequencies all got multiplied by 2. When data are integers, you want the boundaries at -.5, .5, 1.5, 2.5, etc. Or use barplot(). Really, though, you want scatterplots because the teams are playing against each other. You’ll want heatmaps, actually: scatterplots don’t work so well with discrete data.

Yeah, you are right, I should have used another kind of graphs, I rushed to finish,

Here you may find the graphs, but using the barplot() function of R.

In order to produce better results it is sufficient to insert these lines of code at the end of the euro2016.R:

par(mfrow=c(2,4))

for (i in 1:4){

BAr1<-barplot(table(sims_Round16$score1_prev[,i]), xlab=teams[team1_prev[i]], main=paste("Game", i), col="red")

BAr2<-barplot(table(sims_Round16$score2_prev[,i]),xlab=teams[team2_prev[i]], main=paste("Game", i), col="green")

}

par(mfrow=c(2,4))

for (i in 5:8){

BAr1<-barplot(table(sims_Round16$score1_prev[,i]), xlab=teams[team1_prev[i]], main=paste("Game", i), col="red")

BAr2<-barplot(table(sims_Round16$score2_prev[,i]),xlab=teams[team2_prev[i]], main=paste("Game", i), col="green")

}

We were talking about this yesterday as well and I was wondering about the following:

Results in our office pool are taken as the scores after 120 Minutes, if extrasession is necessary.

I know the model tries to predict goals scored given avaiable team information.

But given that a win for each team has p=1/3 and the outcome of a draw after 90 Minutes has as well p=1/3, but with an extra time the prob of ending up with draw after 120 minutes remains on same probability path, a draw after 120 Minutes has p=1/9 whereas a win by either Team A or Team B is 2*(1/3+1/3*1/3)=8/9.

Maybe I get something wrong but giving out draws for 6 out of 8 games is probably ok within the model if it predicts the score after 90 Minutes. But 6 out of 8 as draw after 120 Minutes raises some concern about the validity odf these prediction. Note that so far, 1 out of 6 games has gone to penalties, which seems reasonable given that it has p of only 1/9.

Hi Robert, your comment makes sense. But as my report says, the ties are predicted within the 90 regular minutes. Anyway, I agree with you that 6 ties out of 8 is an overestimate, but this is due to the supposed equilibrium between teams in this Euro Cup. Apparently, my model shrinks too much the teams towards an equilibrium state (tie, actually), and I am gonna improve this aspect. In fact, Germany is much stronger than Slovakia (3-0 the actual result), and even some naive supporters could have predicted the german victory. However, this is a challenging issue (change priors? Insert some other predictors?) and I am glad to receive comments like this and advices.

leonardo egidi,

send me an email. dmaxashman@gmail.com

you are verifying how well your model does in the wrong fashion. you can only compare your predictions to results if you have hundreds of results. otherwise you want to compare your predictions to the CLOSING line in the sports betting market here http://www.sportsbookreview.com/betting-odds/soccer/?date=20160626 (adjust date in the URL for different games). here are the steps to getting a good model 1)your predictions vary widely from vegas line 2)your predictions match vegas line 3)you improve your model to beat vegas line by a small margin. at a glance, comparing your predictions to SBR, it looks like you are still in stage 1