I sent a copy of my paper (coauthored with Cosma Shalizi) on Philosophy and the practice of Bayesian statistics in the social sciences to Richard Berk, who wrote:
I read your paper this morning. I think we are pretty much on the same page about all models being wrong. I like very much the way you handle this in the paper. Yes, Newton’s work is wrong, but surely useful. I also like your twist on Bayesian methods. Makes good sense to me. Perhaps most important, your paper raises some difficult issues I have been trying to think more carefully about.
1. If the goal of a model is to be useful, surely we need to explore that “useful” means. At the very least, usefulness will depend on use. So a model that is useful for forecasting may or may not be useful for causal inference.
2. Usefulness will be a matter of degree. So that for each use we will need one or more metrics to represent how useful the model is. In what looks at first to be simple example, if the use is forecasting, forecasting accuracy by something like MSE may be a place to start. But that will depend on one’s forecasting loss function, which might not be quadratic or even symmetric. This is a problem I have actually be working on and have some applications appearing. Other kinds of use imply a very different set of metrics — what is a good usefulness metric for causal inference, for instance?
3. It seems to me that your Bayesian approach is one of several good ways (and not mutually exclusive ways) of doing data analysis. Taking a little liberty with what you say, you try a form of description and if it does not capture well what is in the data, you alter the description. But like use, it will be multidimensional and a matter of degree. There are these days so many interesting ways that statisticians have been thinking about description that I suspect it will be a while (if ever) before we have a compelling and systematic way to think about the process. And it goes to the heart of doing science.
4. I guess I am uneasy with your approach when it uses the same data to build and evaluate a model. I think we would agree that out-of-sample evaluation is required.
5. There are also some issues about statistical inference after models are revised and re-estimated using the same data. I have attached “>a recent paper written for criminologists, co-authored with Larry Brown and Linda Zhao, that appeared in Quantitative Criminology. It is frequentist in perspective. Larry and Ed George are working on a Bayesian version. Along with Andreas Buja and Larry Shepp, we are working on appropriate methods to post-model selection inference, given that current practice is just plain wrong and often very misleading. Bottom line: what does one make of Bayesian output when the model involved has been tuned to the data?
I agree with your points #1 and #2. We always talk about a model being “useful” but the concept is hard to quantify.
I also agree with #3. Bayes has worked well for me but I’m sure that other methods could work fine also.
Regarding point #4, the use of the same data to build and evaluate the model is not particularly Bayesian. I see what we do as an extension of non-Bayesian ideas such as chi^2 tests, residual plots, and exploratory data analysis–all of which, in different ways, are methods for assessing model fit using the data that were used to fit the model. In any case, I agree that out-of-sample checks are vital to true statistical understanding.
To put it another way: I think you’re imagining that I’m proposing within-sample checks as an alternative to out-of-sample checking. But that’s not what I’m saying. What I’m proposing is to do within-sample checks as an alternative to doing no checking at all, which unfortunately is the standard in much of the Bayesian world (abetted by the subjective-Bayes theory/ideology). When a model passes a within-sample check, it doesn’t mean the model is correct. But in many many cases, I’ve learned a lot from seeing a model fail a within-sample check.
Regarding your very last point, there is some classic work on Bayesian inference accounting for estimation of the prior from data. This is the work of various people in the 1960s and 1970s on hierarchical Bayes, when it was realized that “empirical Bayes” or “estimating the prior from data” could be subsumed into a larger hierarchical framework. My guess is that such ideas could be generalized to a higher level of the modeling hierarchy.