Bayesian parameter estimation is not quite the same thing as Bayesian prediction

Chris Paulse pointed me to this magazine article entitled “Bayes rules: a once-neglected statistical technique may help to explain how the mind works,” featuring a paper by Thomas Griffiths and Joshua Tenenbaum (see quick review here by Michael Stastny). The paper examines the ability of people to use partial information “make predictions about the duration or extent of everyday phenomena such as human life spans and the box-office take of movies.” Griffiths and Tenenbaum find that many people’s predictions can be modeled as Bayesian inferences.

This is all fine and interesing. Based on my knowledge of other experiments of this sort, I’m a bit skeptical–I suspect that different people use different sorts of reasoning to make these predictive estimates–but of course this is why psychologists such as Griffiths and Tenenbaum do research in this area.

Historical background

But I do have a problem with the article about this study that appeared in the Economist magazine. The article makes a big deal about the differences between the “Bayesian” and “frequentist” schools of statistics. FIrst off, I’m surprised that they think that frequentist methods “dominate the field and are used to predict things as diverse as the outcomes of elections and preferences for chocolate bars.” They should take a look at some recent issues of JASA or at the new book by Peter Rossi, Greg Allenby, and Rob McCulloch on Bayesian statistics and marketing. More generally, they don’t seem to realize that psychologists have been modeling decision making as Bayesians for decades–that’s the basis of the “expected utility model” of von Neumann etc.

The statistical distinction between “estimation” and “prediction”

But I have a more important point to make–or, at least, a more staistical point. Classical statistics distinguishes between estimation and prediction. Basically, you estimate parameters and you predict observables. This is not just a semantic distinction. Parameters are those things that generalize to future studies, observables are ends in themselves. If the joint distribution of all the knowns and unknowns is written as a directed acycllic graph, the arrows go from parameters to observables and not the other way around. Or, to put it another way, one instance of a parameter can correspond to many observables.

Anyway, frequentist statistics treats esitmation and prediction differently. For example, in frequentist statistics, theta.hat is an unbiased estimate if E(theta.hat|theta) = theta, for any theta. But y.hat is an unbiased prediction if E(y.hat|theta) = E(y|theta) for any theta. Note the difference: the frequentist averages over y, but not over theta. (See pages 258-249, and the footnote on page 411, of Bayesian Data Analysis (second edition) for more on this.) Frequentist inference thus makes this technical distinction between estimation and prediction (for example, consider the terms “best linear unbiased estimation” and “best linear unbiased prediction”).

Unfair to frequentists

Another way of putting this is: everybody, frequentists and Bayesians alike, agree that it’s appropriate to be Bayesian for predictions. (This is particularly clear in time series analysis, for example.) The debate arises over what to do with estimation: whether or not to average over a distribution for those unknown thetas. So the Economist is misleading in describing the Griffiths/Tenenbaum result as a poke in the eye to frequentists. A good frequentist will treat these problems as predictions and apply Bayesian inference.

(Although, I have to admit, there is some really silly frequentist reasoning out there, such as the “doomsday argument”; see here and here, and here.)