A couple days ago I responded to comments by Mayo, Stephen Senn, and Larry Wasserman. I will respond to Hennig by pulling out paragraphs from his discussion and then replying.
for me the terms “frequentist” and “subjective Bayes” point to interpretations of probability, and not to specific methods of inference. The frequentist one refers to the idea that there is an underlying data generating process that repeatedly throws out data and would approximate the assumed distribution if one could only repeat it infinitely often.
Hennig makes the good point that, if this is the way you would define “frequentist” (it’s not how I’d define the term myself, but I’ll use Hennig’s definition here), then it makes sense to be a frequentist in some settings but not others. Dice really can be rolled over and over again; a sample survey of 1500 Americans really does have essentially infinitely many possible outcomes; but there will never be anything like infinitely many presidential elections or infinitely many worldwide flu epidemics.
The subjective Bayesian one is about quantifying belief in a rational way; following de Finetti, it would in fact be about belief in observable future outcomes of experiments, and not in the truth of models. Priors over model parameters, according to de Finetti, are only technical devices to deal with belief distributions for future outcomes, and should not be interpreted in their own right.
I understand the appeal of the pure predictive approach, but what I think is missing here is that what we call “parameters” are often conduits to generalizability of inference.
Consider my work with Frederic Bois in toxicology. When studying the concentrations of a toxin in blood and exhaled air, you can model the data directly with some convenient and flexible functional form—a “phenomenological” model—or you can use a more fundamental model based on latent parameters with direct physical and biological interpretations: the volume of the liver, the equilibrium concentration of the toxin in fatty tissues compared to the blood, and so forth. Modeling using latent parameters is more difficult—you have to throw in lots of prior information to get it to work, as we discuss in our article—but, on the plus side, there is biological reason to suspect that these parameters generalize from person to person. Which, in turn, gives our hierarchical prior distributions a chance to do the partial pooling that gives reasonably precise individual-level inferences.
There’s a saying, A chicken is nothing but an egg’s way of creating another egg. Similarly, the de Finetti philosophy (as described by Henning) might say that parameters are nothing but data’s way of predicting new data. But this misses the point. Parameterization encodes knowledge, and parameters with external validity encode knowledge particularly effectively.
However, I think that any single analysis that uses and interprets probabilities can only make sense if it is clear what is meant by “probability” in that particular situation. So I think that it’s a quite serious omission that Gelman doesn’t tell us his interpretation (he may do that elsewhere, though).
Indeed, I do give my interpretation of probabilities elsewhere. I thought a lot about this when writing Bayesian Data Analysis, and my interpretation is stated at length in chapter 1 of that book. These were my ideas 20 years ago but I still pretty much hold on to them (except that, as I’ve discussed often on this blog and elsewhere, I’ve moved away from noninformative priors and now I think that weakly informative priors are the way to go).
Hennig concludes with a statement of concern about posterior predictive checking. I will respond in three ways:
1. Posterior predictive checks reduce to classical goodness-of-fit tests when the test statistic is pivotal; when this is not the case, there truly is uncertainty about the fit, and I prefer to go the Bayesian route and average over that uncertainty.
2. Whatever you may think about them theoretically, posterior predictive checks really can work. See chapter 6 of Bayesian Data Analysis and my published papers for many examples. It might well be that something better is out there, but the alternative I always see is people simply not checking their models. I’ll see exploratory graphs of raw data, pages and pages of density plots of posterior simulations, trace plots and correlation plots of iterative simulations—but no plots comparing model to data.
The basic idea of posterior predictive checking is, as they say, breathtakingly simple: (a) graph your data, (b) fit your model to data, (c) simulate replicated data (a Bayesian can always do this, because Bayesian models are always “generative”), (d) graph the replicated data, and (e) compare the graphs in (a) and (d). It makes me want to scream scream scream scream scream when statisticians’ philosophical scruples stop them from performing these five simple steps (or, to be precise, performing the simple steps (a), (c), (d), and (e), given that they’ve already done the hard part, which is step (b)).
3. In some settings, a posterior predictive check will essentially never “reject”; that is, there are models that have a very high probability of replicating certain aspects of the data. For example, a normal distribution with flat prior distribution will reproduce the mean (but not necessarily the median) of any dataset. In some of these situations I think it’s a good thing that the posterior predictive check does not “reject”; other times I am unhappy with this property. See this long blog post from a couple years ago for a discussion.