“Best Linear Unbiased Prediction” is exactly like the Holy Roman Empire

Dan Gianola pointed me to this article, “One Hundred Years of Statistical Developments in Animal Breeding,” coauthored with Guilherme Rosa, which begins:

Statistical methodology has played a key role in scientific animal breeding. Approximately one hundred years of statistical developments in animal breeding are reviewed. Some of the scientific foundations of the field are discussed, and many milestones are examined from historical and critical perspectives. The review concludes with a discussion of some future challenges and opportunities arising from the massive amount of data generated by livestock, plant, and human genome projects.

Here were my comments:

It’s a fascinating story.

Just one thing: I continue to think the concept of “best linear unbiased prediction” has been oversold, for several reasons:

1. It’s not really “best” unless the model is true. But one of the key virtues of these procedures is that they are robust and in practice work well even if the model is not true (unlike, for example, classical p-value-based approaches which can rely very strongly on particular model assumptions).

2. The “unbiased” thing again kinda misses the point, in that this is a property that only holds conditional on the hyperparameters, which in practice typically need to be estimated from the data.

3. Same problem with “linear,” also that’s misleading in that the same idea (hierarchical modeling) can be applied to nonlinear models.

4. Finally “prediction” is misleading in that these methods can be applied to inferential problems in which no prediction is involved.

I like the methods, and I respect the important contributions of Henderson, so it’s fair enough to use his terminology—but at this point I think we’d be better off just talking about hierarchical (or multilevel) models and then saying that for historical reasons these happen to be called blups, but that terminology is restrictive.

Gianola replied:

On your comments, keep in mind that the much of the terminology e.g., BLUP dates back to a time in which many of us had not been born yet, the term hierarchical model applied only to nested ANOVA situations (e.g., PROC NESTED in SAS), etc.

1. Agree. The same is true for BLUE and most things in statistics, even non-linear hierarchical models. If the hierarchy is a bad representation of the state of nature, everything may go KAPUT.

2. Sure. That is the reason why sometimes users employ the term “empirical BLUP”, or when econometricians refer to “estimated GLS”.

3. Many non-linear (hierarchical) models are described in the review of literature. Agree, but the realization that the same principles could be applied came in the late 80s.

4. Agree. The titles of the sections try to honor the names of the techniques proposed at a particular time.

All in all, we feel that there was rapid evolution, and that animal breeders contributed much to applied statistics and continue doing so. For example, the first suggestion that RKHS could be used in whole-genome prediction of complex traits was made within animal breeding. It is good that other fields, e.g., genetic medicine (Nancy Cox, Chicago) have recognized that.

15 thoughts on ““Best Linear Unbiased Prediction” is exactly like the Holy Roman Empire

  1. Hi there. I’m an aspiring statistician and just looking for some advice. I’m currently a 3 out of 5 year junior at Cleveland State University studying Computer Science with a minor in Statistics. So far I have only taken two statistics course, learning up to ANOVA and multiple comparisons. When I graduate college, my plan is to work for a couple years to help mitigate my loans then go to graduate school somewhere to get my PhD in Statistics and go into industry. (and a masters in something, haven’t decided yet. Any suggestions would be marvelous). I’ve taken a crack at learning R but it’s slow going since I’m not a big fan of programming (and I’m comp sci, I know).

    So I’ll cut to the chase now. I’ve been lurking this blog for a couple months now and notice a recurring theme about Bayesian vs. Frequentist Statistics. If anyone could recommend some reading or online (free) lectures that could give me an overview of the topic, that would be fantastic. Andrew, you run a phenomenal blog and I’ve definitely learned a lot from it. Sorry to hijack this post; I felt like this would be a great venue to ask for some advice. Thanks all :)

    P.S. I’ve read that there are three different major disciplines in graduate statistics and I also have not decided upon which. Also, if you need any more information about my interests and/or background, I’d be glad to provide

    • For some practical computational examples of Bayesian methods, check out the online book Probabilistic Programming Methods for Hackers to get going using pymc. Also go to kaggle and read any forum post or winning solution from Tim Salimans. He doesn’t always use Bayesian approaches, but he knows when he should and consistently wins competitions with it(“ark Worlds” and “Don’t Overfit!” are good examples).

  2. I always thought Voltaire was being a bit harsh here, a tad teutonophobic.They were sort of holy, big into rome, and acted with an air of imperial superiority and divine right. Not to different to modern geopolitical namedropping as per the ‘People’s Democratic Republic of Korea’

  3. “1. Agree. The same is true for BLUE and most things in statistics, even non-linear hierarchical models. If the hierarchy is a bad representation of the state of nature, everything may go KAPUT.”

    Surprised you let this slide it’s completely wrong. You hear this from economists a lot – “you can’t use a random effects model if there’s no randomization mechanism”.

      • the only non-tautological reading I have of that is that the relationships in the hierarchical model have to represent something with a correspondence in the physical reality.

        This thinking prevents people from simply using things like a prior distributions over interaction coefficients because they worry that there’s no physical relationship between the interaction coefficients; the model doesn’t ‘represent a state of nature and everything may go KAPUT’.

        If he meant that ‘bad representation of the state of nature’ just means non-working model it would be equivalent to saying a bad model is a bad model which is a nonsensical tautology.

        • Anon:

          “State of nature” can refer to the distribution of parameters of interest. For example, suppose we are studying the effects of an educational program applied in several schools (I’ll skip animal breeding as I don’t know anything about that topic), and the model is that the effects follow a linear model given school-level predictors, plus independent normally-distributed errors. This can be a good representation of the state of nature (in the sense that there are treatment effects, defined as the average difference between treatment and control, were these applied to every student who could be in each school), and these effects can be reasonably approximated by the statistical model. This is not a “tautological reading” in that it is possible for the model to be bad (for example, if an assumed linear effect is actually strongly non-monotonic), but it does not require a correspondence to “physical reality,” nor does it require a randomization mechanism.

        • > it does not require a correspondence to “physical reality,” nor does it require a randomization mechanism.
          Agree – requiring models/representations to be physically connected/index-ical is naive (joke below).

          But I think authors like David Cox think its important to have “physical variation reality” to avoid being seen as using prior assumptions in hierarchical modeling (not that I agree).

          Joke:
          Husband: Piccaso that painting you did of my wife does not look anything like her!
          Piccaso: Really, what does she look like?
          Husband: Here, I have a picture in my wallet, look.
          Piccaso: My, she is awfully tiny!!
          (Physics variation on this – My, she is awfully flat!!)

        • “But I think authors like David Cox think its important to have “physical variation reality” to avoid being seen as using prior assumptions in hierarchical modeling (not that I agree).”

          Isn’t this equivalent to calibration, or as LW might call it, “frequentist pursuit”? I’m not sure what I think about this.

          On one hand, it makes the interpretation of prior and posterior probabilities, if not “objective”, at least directly observable between researchers. On the other hand, in many cases there may not be sufficient data for calibration in which case I don’t think modeling attempts should be disallowed on that basis alone.

        • > I don’t think modeling attempts should be disallowed on that basis alone.
          Agree. Only those that trap you from not getting less wrong should be disallowed.

  4. With regard to point 2, there is a considerable literature on the effect of the estimation of the hyperparameters. A while back, I wrote a paper in which much of this literature is reviewed: “Accounting for the estimation of variances and covariances in prediction under a general linear model: an overview,” Tatra Mountains Mathematical Publications, 39 (2008), 1-14. While the effects of the estimation of the hyperparameters on such things as the mean square error of the predictors is relatively complicated, the unbiasedness of the predictors is not affected (subject to some relatively unrestrictive conditions that are generally satisfied in practice). And of course in some of the more longstanding applications, there is enough information available about the hyperparameters that their estimation does not have much of an effect on the predictors.

    With regard to a point made by Gianola, it is true that historically the term hierarchical was often used synonymously with nested. However, as I pointed out in the rejoinder to the discussion of my recent TAS paper, it has long been recognized within the “linear-model community” that mixed- and random-effects models (nested or not) can be viewed as hierarchical (in the “other” sense).

  5. The name issues with BLUPs is part of the reason why Doug Bates prefers the term “conditional modes”, which both better captures what’s going on and makes more sense in the generalized and non-linear contexts. Incidentally, he has mentioned on several occasions that Alan James told him roughly the same thing in response to a paper about nonlinear hierarchical models — “[he] liked the idea
    of finding the random effects values that would be the BLUP’s – except
    that they are not linear and not unbiased and there is no clear sense
    in which they are `best'”.[1][2]

Leave a Reply

Your email address will not be published. Required fields are marked *