Skip to content

NFL players keep getting bigger and bigger

Screen Shot 2014-06-11 at 1.47.15 PM

Aleks points us to this beautiful dynamic graph by Noah Veltman showing the heights and weights of NFL players over time. The color is pretty but I think I’d prefer something simpler, just one dot per player (with some jittering to handle the discrete reporting of heights and weights). In any case, it’s a great graph. Click on the link to see it in action.

P.S. Even better, once we move to a dynamic scatterplot, would be to use different colors for different positions, and to allow the reader of the graph to highlight different positions. On the linked page, Veltman writes, “the blob separating into multiple groups in the 1990s . . . likely reflects increased specialization of body type by position.” But we should be able to see this directly, no need for speculation, right?

A world without statistics

A reporter asked me for a quote regarding the importance of statistics. But, after thinking about it for a moment, I decided that statistics isn’t so important at all. A world without statistics wouldn’t be much different from the world we have now.

What would be missing, in a world without statistics?

Science would be pretty much ok. Newton didn’t need statistics for his theories of gravity, motion, and light, nor did Einstein need statistics for the theory of relativity. Thermodynamics and quantum mechanics are fundamentally statistical, but lots of progress could’ve been made in these areas without statistics. The second law of thermodynamics is an observable fact, ditto the two-slit experiment and various experimental results revealing the nature of the atom. The A-bomb and, almost certainly, the H-bomb, maybe these would never have been invented without statistics, but on balance I think most people would feel that the world would be a better place without these particular scientific developments. Without statistics, we could forget about discovering the Hibbs boson etc, but that doesn’t seem like such a loss for humanity.

At a more applied level, statistics helped to win World War 2, most notably in cracking the Enigma code but also in various operations-research efforts. And it’s my impression that “our” statistics were better than “their” statistics. So that’s something.

Where would civilian technology be without statistics? I’m not sure. I don’t have a sense of how necessary statistics was for quantum theory. In a world without statistics, would the study of quantum physics have progressed far enough so that transistors were invented? This one, I don’t know. And without statistics we wouldn’t have modern quality control, so maybe we’d still be driving around in AMC Gremlins and the like. Scary thought, but not a huge deal, I’d think. No transistors, though, that would make a difference in my life. No transistors, no blogging! And I guess we could also forget about various unequivocally beneficial technological innovations such as modern pacemakers, hearing aids, cochlear implants, and Clippy.

Modern biomedicine uses lots and lots of statistics, but would medicine be so much worse without it? I don’t think so, at least not yet. You don’t need statistics to see that penicillin works, nor to see that mosquitos transmit disease and that nets keep the mosquitos out. Without statistics, I assume that various mistakes would get into the system, various ineffective treatments that people think are effective, etc. But on balance I doubt these would be huge mistakes, and the big ones would eventually get caught, with careful record-keeping even without statistical inference and adjustments. Without statistics, biologists would not be able to sequence the gene, and I assume they’d be much slower at developing tools such as tests that allow you to check for chromosomal abnormalities in amnio. I doubt all these things add up to much yet, but I guess there’s promise for the future. Statistics is also necessary for a lot of drug development—right now my colleagues and I are working on a pharmacodynamic model of dosing—but, again, without any of this, it’s not clear the world would be so much different.

The Poverty Lab team use statistics and randomized experiments to see what works to help the lives of poor people around the world. That’s cool but I’m not ultimately convinced this all makes a difference in the big picture. Or, to put it another way, I suspect that the statistical validation serves mostly as a way to build political consensus for economic policies that will be effective in sharing the wealth. By demonstrating in a scientific way that Treatment X is effective, this supports the idea that there is a way to help the sort of people who live in what Nicholas Wade would describe as “tribal” societies. So, sure, fine, but in this case the benefits of the statistical methods are somewhat indirect.

Without statistics, we wouldn’t have most of the papers in “Psychological Science,” but I could handle that. Piaget didn’t need any statistics, and I think the modern successors of Piaget could’ve done pretty much what they’ve done without statistics, just by careful observation of major transitions.

Careful observation and precise measurement can be done, with or without statistical methods. Indeed, researchers often use statistics as a substitute for careful observation and precise measurement. That is a horrible thing to do, and if you have a clear understanding of statistical theory, you can see why. But statistics is hard, and lots of researchers (and journal editors, news reporters, etc.) don’t have that understanding. When statistics is used as a substitute for, rather than an adjunct to, scientific measurement, we get problems.

OK, here’s another one: no statistics, no psychometrics. That’s too bad but one could make the argument that, on the whole, psychometrics has done more harm than good (value-added assessment, anyone?). Don’t get me wrong—I like psychometrics, and a strong argument could be made that it’s done more good than harm—but my point here is that the net benefit is not clear; a case would have to be made.

Polling. Can’t do it well without statistics. But, would a world without polling be so horrible? Much as I hate to admit it, I don’t think so. Don’t get me wrong, I think polling is on balance a good thing—I agree with George Gallup that measurement of public opinion is an important part of the modern democratic process—but I wouldn’t want to hang too much of the benefits of statistics on this one use, given that I expect lots of people would argue that opinion polls do more harm than good in politics.

The alternative to good statistics is . . .

Perhaps the most important benefits of statistics come not from the direct use of statistical methods in science and technology, but rather in helping us learn about the world. Statisticians from Francis Galton and Ronald Fisher onward have used statistics to give us a much deeper understanding of human and biological variation. I can’t see how any non-statistical, mechanistic model of the world could reproduce that level of understanding. Forget about p-values, Bayesian inference, and the rest: here I’m simply talking about the nature of correlation and variation.

For a more humble example, consider Bill James. Baseball is a silly example, sure, but the point is to see how much understanding has been gained in this area through statistical measurement and comparison. As James so memorably wrote, the alternative to good statistics is not “no statistics,” it’s “bad statistics.” James wrote about baseball commentators who would make asinine arguments which they would back up by picking out numbers without context. In politics, the equivalent might be a proudly humanistic pundit such as New York Times columnist David Brooks supporting his views by just making up numbers or featuring various “too good to be true” statistics and not checking them.

So here’s one benefit to the formal study of statistics: Without any statistics, there still would be numbers, along with people trying to interpret them.

Could governments and large businesses be managed well without statistics? I’m not sure. Given that half the U.S. Congress seems willing to shut down the government from time to time, it’s not clear than any agreement on the numbers will have much to do with political action. Similarly, all the statistics in the world don’t seem to be stopping the euro-zone from drifting. But maybe things would be much worse without a common core of statistical agreement. I don’t know; unfortunately this seems like the sort of causal question that is too difficult for statistics to answer.

Finally, one way that statistics is potentially having a huge impact in our lives is through the measurement of global warming and all the rest. But I’m guessing that a lot of this could be done with a pre-statistical understanding. The basic physics is already there, as would be the careful measurements. Statistical modeling is certainly relevant to the study of climate change—if you’re trying to reconstruct historical climate conditions from tree-ring data, it’s tough enough to do it with statistical modeling, I can’t imagine how it could be done otherwise—but the basic patterns of carbon dioxide, temperature, melting ice, etc., are apparent in any case. And, even with statistics, much uncertainty remains.


When I started writing this post, I was thinking that statistics doesn’t really matter, but I think that’s because I was focusing on some of the more highly-publicized but less beneficial applications of statistics: the use of statistical experimentation and inference to get p-values for tabloid-bait scientific papers, or for Google, Amazon, etc., to perfect their techniques for squeezing money out of their customers or, even at best, to test a medical treatment that increases survival rate for some rare disease by 2 percentage points. But statistics is central to how we think about the world. I still think that statistics is much less central to our lives than, say, chemistry. But it ain’t nothing.

Battle of the cozy comedians: What’s Alan Bennett’s problem with Stewart Lee?


When in London awhile ago I picked up the book, “How I Escaped My Certain Fate: The Life and Deaths of a Stand-Up Comedian,” by Stewart Lee. I’d never heard of the guy but the book was sitting there, it had good blurbs, and from a quick flip-through it looked interesting. Now that I’ve read the whole thing, I can confirm that it really is interesting. I recommend it. Along with transcripts of some of his comedy routines—which aren’t particularly funny most of the time, at least not on the page—he has lots of discussion of what works and what doesn’t work on stage and how he wants to communicate with his audience. It all reminds me a lot of the things I think about when giving statistics talks. I mean, sure, Lee is much much more of a pro than I’ll ever be, but a lot of his issues resonate with me too. In particular there’s the idea of wanting a laugh but not a cheap laugh (which in a technical talk corresponds to the goal of transmitting the excitement and importance of one’s work without lapsing into Ted-talk hype) and various tactics of engaging the audience.

Also the idea that there is no single optimal style, that your approach to presentation, like a diaper, needs to be regularly changed to stay fresh.

Lee’s book was also interesting because he gives off a regular-guy vibe, sort of like the essayist David Owen, who gives the impression of being an earnest person, not a deep or particularly quick thinker, more like a gentle guide who can plod along with the reader at his or her own pace. He’s not a true original like George Carlin or brilliant like Chris Rock, more of a guy who’s doing his best every day, and with a pleasant self-awareness that elevates his work.

So that was that. But then one day I read this offhand remark from Alan Bennett:=

Peter [Cook] . . . was already in 1960 established as a successful sketch writer for revues in the West End. This meant that at that time he had no wish to offend an audience and shied away from sketches that did. It was only later in his career that, as his humour became more anarchic and audiences in their turn more fawning and in on the joke, he ceased to care. Showbiz dies hard and in these toothless stand-up days I think Peter might just have liked Jeremy Hardy but would have drawn the line at Stewart Lee.

I can’t be sure, but it sounds like Bennett considers Lee to be a bit tacky. Just as Greg Mankiw used his late grandmother as a mouthpiece for his distaste for Sonia Sotomayor, Bennett seems to be using his late colleague Cook to diss Lee.

I honestly have no idea what’s going on here. To my American eyes, Lee and Bennett seem very similar: two cozy left-wing English comedy writer/performers, successful but self-deprecating . . . really it’s hard for me to see much difference. OK, Bennett is gay while Lee is a sensitive heterosexual, but that can’t be the whole story. There must be something else going on: maybe Lee is too “middlebrow” for Bennett? Or maybe it’s the opposite, that Bennett sees Lee as one of those kids who doesn’t know how it’s really done?

Could any of our English readers inform me on this one? It’s no big deal but I hate being baffled like this.

Skepticism about a published claim regarding income inequality and happiness

Frank de Libero writes:

I read your Chance article (disproving that no one reads Chance!) re communicating about flawed psychological research. And I know from your other writings of your continuing good fight against misleading quantitative work. I think you and your students might be interested on my recent critique of a 2011 paper published in Psychological Science, “Income Inequality and Happiness” by Shigehiro Oishi, et al. The critique is here.

The blog post demonstrates that treating ordinal numbers with respect along with an eye to robustness leads to contrary conclusions – and a more interesting conjecture. If nothing else for your students, the post is an example of how an applied statistician thinks.

I have emailed the three authors and editor. I don’t expect to see a retraction. But maybe someone will pick up on the recommendations. We’ll see.

De Libero’s critique is worth reading. Lots of interesting points, could be a good example for a statistics class, if the instructor is looking for something that, unlike typical textbook analyses, does not have a simple clean story. Also there seem to be problem with this paper published in Psychological Science, but that’s hardly news. . . .

P.S. In the old days I would’ve crossposted this on the sister blog. But now they don’t like running duplicate material, and so I thought it better to post in this space, since here we get good discussions in the comments.

On deck this week

Mon: Skepticism about a published claim regarding income inequality and happiness

Tues: Battle of the cozy comedians: What’s Alan Bennett’s problem with Stewart Lee?

Wed: A world without statistics

Thurs: NFL players keep getting bigger and bigger

Fri: “An Experience with a Registered Replication Project”

Sat, Sun: As Chris Hedges would say: Don’t worry, baby

On deck for the rest of the summer

  • Skepticism about a published claim regarding income inequality and happiness
  • Battle of the cozy comedians: What’s Alan Bennett’s problem with Stewart Lee?
  • A world without statistics
  • NFL players keep getting bigger and bigger
  • “An Experience with a Registered Replication Project”
  • A linguist has a question about sampling when the goal is causal inference from observational data
  • What do you do to visualize uncertainty?
  • Statistics and data science, again
  • The health policy innovation center: how best to move from pilot studies to large-scale practice?
  • The “scientific surprise” two-step
  • Correlation does not even imply correlation
  • When doing scientific replication or criticism, collaboration with the original authors is fine but I don’t think it should be a requirement or even an expectation
  • Scientific communication by press release
  • Nate Silver’s website
  • Estimated effect of early childhood intervention downgraded from 42% to 25%
  • Understanding the hot hand, and the myth of the hot hand, and the myth of the myth of the hot hand, and the myth of the myth of the myth of the hot hand, all at the same time
  • People used to send me ugly graphs, now I get these things
  • Updike and O’Hara
  • Luck vs. skill in poker
  • If you do an experiment with 700,000 participants, you’ll (a) have no problem with statistical significance, (b) get to call it “massive-scale,” (c) get a chance to publish it in a tabloid top journal. Cool!
  • Science tells us that fast food lovers are more likely to marry other fast food lovers
  • Stroopy names
  • “A hard case for Mister P”
  • The field is a fractal
  • Replication Wiki for economics
  • Discussion of “Maximum entropy and the nearly black object”
  • Review of “Forecasting Elections”
  • Discussion of “A probabilistic model for the spatial distribution of party support in multiparty elections”
  • Pre-election survey methodology: details from nine polling organizations, 1988 and 1992
  • Avoiding model selection in Bayesian social research

And, scheduled for Labor Day:

  • Bad Statistics: Ignore or Call Out?

Enjoy your summer! Unless you live in the southern hemisphere, in which case, Merry Christmas.

Differences between econometrics and statistics: From varying treatment effects to utilities, economists seem to like models that are fixed in stone, while statisticians tend to be more comfortable with variation


I had an interesting discussion with Peter Dorman (whose work on assessing the value of a life we discussed in this space a few years ago).

The conversation started when Peter wrote me about his recent success using hierarchical modeling for risk analysis. He wrote, “Where have they [hierarchical models] been all my life? In decades of reading and periodically doing econometrics, I’ve never come across this method.”

I replied that it’s my impression that economists are trained to focus on estimating a single quantity of interest, whereas multilevel modeling is appropriate for estimating many parameters. Economists should care about variation, of course; indeed, variation could well be said to be at the core of economics, as without variation of some sort there would be no economic exchanges. There are good reasons for focusing on point estimation of single parameters—in particular, if it’s hard to estimate a main effect, it is typically even more difficult to estimate interactions—but if variations are important, I think it’s important to model and estimate them.

Awhile later, Peter sent me this note:

I’ve been mulling the question about economists’ obsession with average effects and posted this on EconoSpeak. I could have said much more but decided to save it for another day. In particular, while the issue of representative agents has come up in the context of macroeconomic models, I wonder how many noneconomists — and even how many economists — are aware that the same approach is used more or less universally in applied micro. The “model” portion of a typical micro paper has an optimization model for a single agent or perhaps a very small number of interacting agents, and the properties of the model are used to justify the empirical specification. This predisposes economists to look for a single effect that variations in one factor have on variations in another. But the deeper question is why these models are so appealing to economists but less attractive (yes?) to researchers in other disciplines.

I responded:

There is the so-called folk theorem which I think is typically used as a justification for modeling variation using a common model. But more generally economists seem to like their models and then give after-the-fact justification. My favorite example is modeling uncertainty aversion using a nonlinear utility function for money, in fact in many places risk aversion is _defined_ as a nonlinear utility function for money. This makes no sense on any reasonable scale (see, for example, section 5 of this little paper from 1998, but the general principle has been well-known forever, I’m sure), indeed the very concept of a utility function for money becomes, like a rainbow, impossible to see if you try to get too close to it—but economists continue to use it as their default model. This bothers me. I don’t think it’s like physicists starting by teaching mechanics with a no-friction model and then adding friction. I think it’s more like, ummm, I dunno, doing astronomy with Ptolemy’s model and epicycles. The fundamentals of the model are not approximations to something real, they’re just fictions.

Peter answered:

So my deep theory goes like this: the vision behind all of neoclassical economics post 1870 is a unified normative-positive theory. The theory of choice (positive) is at the same time a theory of social optimality. This is extremely convenient, of course. The problem, which has only grown over time, is that the assumptions needed for this convergence, the central role assigned to utility (which is where positive and normative meet) and its maximization, either devolve into tautology or are vulnerable to disconfirmation. I suspect that this is unavoidable in a theory that attempts to be logically deductive, but isn’t blessed, as physics is, by the highly ordered nature of the object of study. (Physics really does seem to obey the laws of physics, mostly.)

I’ve come to feel that utility is the original sin, so to speak. I really had to do some soul-searching when I wrote my econ textbooks, since if I said hostile things about utility no one would use them. I decided to self-censor: it’s simply not a battle that can be won on the textbook front. Rather, I’ve come to think that the way to go at it is to demonstrate that it is still possible to do normatively meaningful work without utility — to show there’s an alternative. I’m convinced that economists will not be willing to give this up as long as they think that doing so means they can’t use economics to argue for what other people should or shouldn’t do. (This also has connections to the way economists see their work in relation to other approaches to policy, but that’s still another topic.)

And I’ve been thinking more about your risk/uncertainty example. Your approach is to look for regularity in the data (observed choices) which best explains and predicts. I’m with you. But economists want a model of choice behavior based on subjective judgments of whether one is “better off”, since without this they lose the normative dimension. This is a costly constraint.

There is an interesting study to be written — maybe someone has already written it — on the response by economists to the flood of evidence for hyperbolic discounting. This has not affected the use of observed interest rates for present value calculation in applied work, and choice-theoretic (positive) arguments are still enlisted to justify the practice. Yet, to a reasonable observer, the normative model has diverged dramatically from its positive twin. This looks like an interesting case of anomaly management.

Lots to think about here (also related to this earlier discussion).

Ethics and statistics

I spoke (remotely) recently at the University of Wisconsin, on the topic of ethics and statistics. Afterward, I received the following question from Fabrizzio Sanchez:

As hard as it is to do, I thought it was good to try and define what exactly makes for an ethical violation. Your third point noted that it needed to break some sort of rule. Could you elaborate on this idea in the context of statistical rules? From my understanding, most statistical rules are not 0 or 1, but somewhere in between. (Removing an outlier comes to mind as an example).

He was responding to my statement that “An ethics problem arises when you are considering an action that (a) benefits you or some cause you support, (b) hurts or reduces benefits to others, and (c) violates some rule.”

I thought the bit about violating a rule was necessary because it’s generally considered acceptable to try to get more for yourself, if you’re doing so within the context of an accepted set of rules. Here I wasn’t thinking so much of statistical rules (for example, the idea that for statistical significance you need p=0.05 not p=0.06) but rather social rules. But maybe there’s more to be said on this.

The big new idea in my talk (which, unfortunately, I didn’t get to during the 20 minutes that were allocated to me) is near the end of the presentation, when I suggest that mainstream statistical methods (Bayes included) can themselves be unethical. Maybe this will the subject of a future Chance column.

P.S. One difficulty in posting slides is that they can be misleading without the accompanying speech. In particular, near the end of the slides I show the notorious third-degree polynomial regression discontinuity fit, under the headline, “Find the ethical problem!” Just to be clear, let me explain that I think the ethical problem here is not with the people who did the analysis and made the graph; rather, I think the ethical problem arises in our scientific publication system itself, which rewards dramatic claims based on statistical significance and dis-incentivizes more realistic, sober assessment of evidence. Also contributing to the ethical problem has been the publication of papers recommending something as goofy as this sort of high-degree polynomial fit.

“The Europeans and Australians were too eager to believe in renal denervation”

As you can see, I’m having a competition with myself for the most boring title ever.

The story, though, is not boring. Paul Alper writes:

I just came across this in the NYT.

Here is the NEJM article itself:

And here is the editorial in the NEJM:

The gist is that on the basis of previous studies without a control arm, renal denervation was thought to be a blockbuster treatment for those suffering from very high blood pressure. The randomized clinical trial with a sham procedure as the control (placebo) found that the effect seems to be mainly psychological. I suppose the moral of the story is that unless there is a control arm, enthusiasm must be tempered. Note too that the U.S. FDA comes out appearing properly slow and skeptical while the Europeans and Australians were too eager to believe in renal denervation.

Paul followed up with this:

Our local newspaper today had an AP article about drugs which lower cholesterol but speculated that despite the lower cholesterol, the lowering of mortality by the drugs are yet to be proved.

To see whether you have already blogged about “surrogate criteria,” I googled *surrogate Gelman* and came up with this and this, which contains this sentence:

The objective of NRL’s research is to develop tissue surrogate materials that simulate the mechanical and acoustical properties of biological tissues. These are then assembled into an experimental test system of the human thorax, called “GelMan,” for assessing blunt forces and blast dynamics.

Hey, I used to work at NRL!

Paul continued:

While I still don’t know what a “GelMan” is, I also found this by a Gelman which has to do with vomiting.

I assume he was referring to my encounter with the Barfblog.

Paul continued:

Note the reference to a “Fisher,” but not R.A.

Just for giggles, I googled *surrogate Alper* and sure enough found many web sites for conceiving a new life and dealing with the end of life by people who have Alper as a last name. One wonders if *surrogate any-name-what-so-ever* would also produce a bunch of google hits.

Hmmm, let’s see what comes up:

Screen Shot 2014-05-10 at 2.46.11 PM

Cool—it works!

P.S. I would’ve given this post the title, “GelMan Thoracic Surrogate for Underwater Threat Neutralization,” but that wouldn’t have been boring enough.

Stan World Cup update

The other day I fit a simple model to estimate team abilities from World Cup outcomes. I fit the model to the signed square roots of the score differentials, using the square root on the theory that when the game is less close, it becomes more variable.

0. Background

As you might recall, the estimated team abilities were reasonable, but the model did not fit the data, in that when I re-simulated the game outcomes using the retrospectively fitted parameters, my simulations were much close than the actual games. To put it another way, many more than 1/20 of the games fell outside their 95% predictive intervals.

1. Re-fitting on the original scale

This was buggin me. In some way, the original post, which concluded with “my model sucks,” made an excellent teaching point. Still and all, it was a bummer.

So, last night as I was falling asleep, I had the idea of re-fitting the model on the original scale. Maybe the square-root transformation was compressing the data so much that the model couldn’t fit. I wasn’t sure how this could be happening but it seemed worth trying out.

So, the new Stan program, worldcup_raw_matt.stan:

data {
  int nteams;
  int ngames;
  vector[nteams] prior_score;
  int team1[ngames];
  int team2[ngames];
  vector[ngames] score1;
  vector[ngames] score2;
  real df;
transformed data {
  vector[ngames] dif;
  dif <- score1 - score2;
parameters {
  real b;
  real sigma_a;
  real sigma_y;
  vector[nteams] eta_a;
transformed parameters {
  vector[nteams] a;
  a <- b*prior_score + sigma_a*eta_a;
model {
  eta_a ~ normal(0,1);
  for (i in 1:ngames)
    dif[i] ~ student_t(df, a[team1[i]]-a[team2[i]], sigma_y);

Just the same as the old model but without the square root stuff.

And then I appended to my R script some code to fit the model and display the estimates and residuals:

# New model 15 Jul 2014:  Linear model on origional (not square root) scale

fit <- stan_run("worldcup_raw_matt.stan", data=data, chains=4, iter=5000)

sims <- extract(fit)
a_sims <- sims$a
a_hat <- colMeans(a_sims)
a_se <- sqrt(colVars(a_sims))
library ("arm")
png ("worldcup7.png", height=500, width=500)
coefplot (rev(a_hat), rev(a_se), CI=1, varnames=rev(teams), main="Team quality (estimate +/- 1 s.e.)\n", cex.var=.9, mar=c(0,4,5.1,2))

a_sims <- sims$a
sigma_y_sims <- sims$sigma_y
nsims <- length(sigma_y_sims)
random_outcome <- array(NA, c(nsims,ngames))
for (s in 1:nsims){
  random_outcome[s,] <- (a_sims[s,team1] - a_sims[s,team2]) + rt(ngames,df)*sigma_y_sims[s]
sim_quantiles <- array(NA,c(ngames,2))
for (i in 1:ngames){
  sim_quantiles[i,] <- quantile(random_outcome[,i], c(.025,.975))

png ("worldcup8.png", height=1000, width=500)
coefplot ((score1 - score2)[new_order]*flip, sds=rep(0, ngames),
          lower.conf.bounds=sim_quantiles[new_order,1]*flip, upper.conf.bounds=sim_quantiles[new_order,2]*flip, 
          varnames=ifelse(flip==1, paste(teams[team1[new_order]], "vs.", teams[team2[new_order]]),
                          paste(teams[team2[new_order]], "vs.", teams[team1[new_order]])),
          main="Game score differentials\ncompared to 95% predictive interval from model\n",
          mar=c(0,7,6,2), xlim=c(-6,6)) ()

And here's what I got:



And this looks just fine, indeed in many ways better than before, not just the bit about the model fit but also the team ability parameters can now be directly interpretable. According to the model fit, Brazil, Argentina, and Germany are estimated to be 1 goal better than the average team (in expectation), with Australia, Honduras, and Cameroon being 1 goal worse than the average.

2. Debugging

But this bothered me in another way. Could those square-root-scale predictions have been that bad? I can't believe it. Back to the code. I look carefully at the transformation in the Stan model:

transformed data {
  vector[ngames] dif;
  vector[ngames] sqrt_dif;
  dif <- score1 - score2;
  for (i in 1:ngames)
    sqrt_dif[i] <- (step(dif[i]) - .5)*sqrt(fabs(dif[i]));

D'oh! That last line is wrong, it's missing a factor of 2. Stan doesn't have a sign() function so I hacked something together using "step(dif[i]) - .5". But this difference takes on the value +.5 if dif is positive or -.5 if dif is negative (zero doesn't really matter because it all gets multiplied by abs(dif) anyway). Nonononononononono.




OK, I fix the code:

transformed data {
  vector[ngames] dif;
  vector[ngames] sqrt_dif;
  dif <- score1 - score2;
  for (i in 1:ngames)
    sqrt_dif[i] <- 2*(step(dif[i]) - .5)*sqrt(fabs(dif[i]));

I rerun my R script from scratch. Stan crashes R. Interesting---I'll have to track this down. But not right now.

I restart R and run. Here are the results from the fitted model on the square root scale:


And here are the predictions and the game outcomes:


I'm not quite sure about this last graph but I gotta go now so I'll post, maybe will look at the code later if I have time.

3. Conclusions

My original intuition, that I could estimate team abilities by modeling score differentials on the square root scale, seems to have been correct. In my previous post I'd reported big problems with predictions, but that's because I'd dropped a factor of 2 in my code. These things happen. Modeling the score differences on the original scale seems reasonable too. It's easy to make mistakes, and it's good to check one's model in various ways. I'm happy that in my previous post I was able to note that the model was wrong, even if at the time I hadn't yet found the bug. It's much better to be wrong and know you're wrong than to be wrong and not realize it.

Finally, now that the model fits ok, one could investigate all sorts of thing by expanding it in various ways. But that's a topic for another day (not soon, I think).