In response to this article by Cosma Shalizi and myself on the philosophy of Bayesian statistics, David Hogg writes:
I [Hogg] agree–even in physics and astronomy–that the models are not “True” in the God-like sense of being absolute reality (that is, I am not a realist); and I have argued (a philosophically very naive
paper, but hey, I was new to all this) that for pretty fundamental reasons we could never arrive at the True (with a capital “T”) model of the Universe. The goal of inference is to find the “best” model, where “best” might have something to do with prediction, or explanation, or message length, or (horror!) our utility. Needless to say, most of my physics friends *are* realists, even in the face of “effective theories” as Newtonian mechanics is an effective theory of GR and GR is an effective theory of “quantum gravity” (this plays to your point, because if you think any theory is possibly an effective theory, how could you ever find Truth?). I also liked the ideas that the prior is really a testable regularization, and part of the model, and that model checking is our main work as scientists.
My only issue with the paper is around Section 4.3, where you say that you can’t even use Bayes to average or compare the probabilities of models. I agree that you don’t think any of your models are True, but if you decide that what the scientist is trying to do is explain or encode (as in the translation between inference and signal compression), then model averaging using Bayes *will* give the best possible result. That is, it seems to me like there *is* an interpretation of what a scientist *does* that makes Bayesian averaging a good idea. I guess you can say that you don’t think that is what a scientist does, but that gets into technical assumptions about epistemology that I don’t understand. I guess what I am asking is: Don’t you use–as you are required by the rules of measure theory–Bayesian averaging, and isn’t it useful? Same with updating. They are useful and correct. It is just that you are not *done* when you have done all that; you still have to do model checking and expanding and generalizing afterwards (but even this can still be understood in terms of finding the best possible effective theory or encoding for the data).
Yet another way of trying to explain my confusion is this: When you describe the convergence process in a model space that *doesn’t* contain the truth, you say that all it tries to do is match the distribution of the data. But isn’t that what science *is*? Matching the distribution of the data with a simpler model? So then Bayes is doing exactly what we want!
Bayesian model averaging could work, and in some situations is does work, but it won’t necessarily work. The problem arises with the models being averaged, in particular the posterior probabilities of the individual models depend crucially on untestable aspects of the prior distributions. In particular, flat priors cause zero marginal likelihoods, and approximations to flat priors cause sensitivity (for example, if you use a N(0,A^2) prior with very large A, then the marginal posterior probability of your model will be proportional to 1/A, hence it matters a lot whether A is 100 or 1000 or 1 million, even though it won’t matter at all for inference within a model.
This is not to say that posterior model averaging is necessarily useless, merely that if you want to do it, I think you need to think seriously about the different pieces of the super-model that you’re estimating. At this point I’d prefer continuous model expansion rather than discrete model averaging. We discuss this point in chapter 6 of BDA.
I agree very much with this point (the 1/A point, which is a huge issue in Bayes factors), and in part for this reason I have been using leave-one-out cross-validation to do model comparison (no free parameters, justifiable, makes sense to scientists and engineers, even skeptical ones, etc). I would also be interested in your opinion about leave-one-out cross-validation; my engineer/CS friends love it.
Cross-validation is great and I’ve used it on occasion. I don’t really understand it. This is not a criticism, I just want to think harder about it at some point. To me, cross-validation is tied into predictive model checking in that ideas such as “leave one out” are fundamentally related do data collection. Xval is like model checking in that the data come in through the sampling distribution, not just the likelihood.