The other day I fit a simple model to estimate team abilities from World Cup outcomes. I fit the model to the signed square roots of the score differentials, using the square root on the theory that when the game is less close, it becomes more variable.

**0. Background**

As you might recall, the estimated team abilities were reasonable, but the model did not fit the data, in that when I re-simulated the game outcomes using the retrospectively fitted parameters, my simulations were much close than the actual games. To put it another way, many more than 1/20 of the games fell outside their 95% predictive intervals.

**1. Re-fitting on the original scale**

This was buggin me. In some way, the original post, which concluded with “my model sucks,” made an excellent teaching point. Still and all, it was a bummer.

So, last night as I was falling asleep, I had the idea of re-fitting the model on the original scale. Maybe the square-root transformation was compressing the data so much that the model couldn’t fit. I wasn’t sure how this could be happening but it seemed worth trying out.

So, the new Stan program, worldcup_raw_matt.stan:

data { int nteams; int ngames; vector[nteams] prior_score; int team1[ngames]; int team2[ngames]; vector[ngames] score1; vector[ngames] score2; real df; } transformed data { vector[ngames] dif; dif <- score1 - score2; } parameters { real b; realsigma_a; real sigma_y; vector[nteams] eta_a; } transformed parameters { vector[nteams] a; a <- b*prior_score + sigma_a*eta_a; } model { eta_a ~ normal(0,1); for (i in 1:ngames) dif[i] ~ student_t(df, a[team1[i]]-a[team2[i]], sigma_y); }

Just the same as the old model but without the square root stuff.

And then I appended to my R script some code to fit the model and display the estimates and residuals:

# New model 15 Jul 2014: Linear model on origional (not square root) scale fit <- stan_run("worldcup_raw_matt.stan", data=data, chains=4, iter=5000) print(fit) sims <- extract(fit) a_sims <- sims$a a_hat <- colMeans(a_sims) a_se <- sqrt(colVars(a_sims)) library ("arm") png ("worldcup7.png", height=500, width=500) coefplot (rev(a_hat), rev(a_se), CI=1, varnames=rev(teams), main="Team quality (estimate +/- 1 s.e.)\n", cex.var=.9, mar=c(0,4,5.1,2)) dev.off() a_sims <- sims$a sigma_y_sims <- sims$sigma_y nsims <- length(sigma_y_sims) random_outcome <- array(NA, c(nsims,ngames)) for (s in 1:nsims){ random_outcome[s,] <- (a_sims[s,team1] - a_sims[s,team2]) + rt(ngames,df)*sigma_y_sims[s] } sim_quantiles <- array(NA,c(ngames,2)) for (i in 1:ngames){ sim_quantiles[i,] <- quantile(random_outcome[,i], c(.025,.975)) } png ("worldcup8.png", height=1000, width=500) coefplot ((score1 - score2)[new_order]*flip, sds=rep(0, ngames), lower.conf.bounds=sim_quantiles[new_order,1]*flip, upper.conf.bounds=sim_quantiles[new_order,2]*flip, varnames=ifelse(flip==1, paste(teams[team1[new_order]], "vs.", teams[team2[new_order]]), paste(teams[team2[new_order]], "vs.", teams[team1[new_order]])), main="Game score differentials\ncompared to 95% predictive interval from model\n", mar=c(0,7,6,2), xlim=c(-6,6)) dev.off ()

And here's what I got:

And this looks just fine, indeed in many ways better than before, not just the bit about the model fit but also the team ability parameters can now be directly interpretable. According to the model fit, Brazil, Argentina, and Germany are estimated to be 1 goal better than the average team (in expectation), with Australia, Honduras, and Cameroon being 1 goal worse than the average.

**2. Debugging**

But this bothered me in another way. Could those square-root-scale predictions have been that bad? I can't believe it. Back to the code. I look carefully at the transformation in the Stan model:

transformed data { vector[ngames] dif; vector[ngames] sqrt_dif; dif <- score1 - score2; for (i in 1:ngames) sqrt_dif[i] <- (step(dif[i]) - .5)*sqrt(fabs(dif[i])); }

D'oh! That last line is wrong, it's missing a factor of 2. Stan doesn't have a sign() function so I hacked something together using "step(dif[i]) - .5". But this difference takes on the value +.5 if dif is positive or -.5 if dif is negative (zero doesn't really matter because it all gets multiplied by abs(dif) anyway). Nonononononononono.

Nononononononononononononononono.

Nonononononononononononononononononononononono.

Damn.

OK, I fix the code:

transformed data { vector[ngames] dif; vector[ngames] sqrt_dif; dif <- score1 - score2; for (i in 1:ngames) sqrt_dif[i] <- 2*(step(dif[i]) - .5)*sqrt(fabs(dif[i])); }

I rerun my R script from scratch. Stan crashes R. Interesting---I'll have to track this down. But not right now.

I restart R and run. Here are the results from the fitted model on the square root scale:

And here are the predictions and the game outcomes:

I'm not quite sure about this last graph but I gotta go now so I'll post, maybe will look at the code later if I have time.

**3. Conclusions**

My original intuition, that I could estimate team abilities by modeling score differentials on the square root scale, seems to have been correct. In my previous post I'd reported big problems with predictions, but that's because I'd dropped a factor of 2 in my code. These things happen. Modeling the score differences on the original scale seems reasonable too. It's easy to make mistakes, and it's good to check one's model in various ways. I'm happy that in my previous post I was able to note that the model was wrong, even if at the time I hadn't yet found the bug. It's much better to be wrong and know you're wrong than to be wrong and not realize it.

Finally, now that the model fits ok, one could investigate all sorts of thing by expanding it in various ways. But that's a topic for another day (not soon, I think).