Ryan Giordano, Tamara Broderick, and Michael Jordan write:

In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. One hopes that the posterior is robust to reasonable variation in the choice of prior, since this choice is made by the modeler and is often somewhat subjective. A different, equally subjectively plausible choice of prior may result in a substantially different posterior, and so different conclusions drawn from the data. . . .

To which I say:

,s/choice of prior/choice of prior and data model/g

Yes, the choice of data model (from which comes the likelihood) is made by the modeler and is often somewhat subjective. In those cases where the data model is *not* chosen subjectively by the modeler, it is typically chosen implicitly by convention, and there is even more reason to be concern about robustness.

I was surprised twice by their summary, which seems talking about two almost independent stories, and by their genius ideas on how to integrate an influence function with variational inference!!

” In those cases where the data model is not chosen subjectively by the modeler, it is typically chosen implicitly by convention, and there is even more reason to be concern about robustness.”

Yes!

Surely by writing that a posterior is formed by “a choice of a prior and a likelihood” they are acknowledging this exact point? Checking for robustness with respect to choice of prior seems like a pretty reasonable thing to do, even if you’d ideally want to also check for robustness with respect to the data model.

Aren’t they saying the same: “posterior follows from the data and a choice of (a prior and a likelihood)”?

The alternative interpretation “posterior follows from (the data) and (a choice of a prior) and (a likelihood)” doesn’t sound right to me (but I’m not a native English speaker, so I might be wrong).

I think the correct thing is “a choice of a prior and a choice of a likelihood”, but that’s not what’s written, and I don’t even think that’s what the author meant. They meant something like “a choice of a prior and the application of the correct likelihood”… (as if the likelihood is god-given but the prior is a choice)

The reason is the next sentence says “One hopes that the posterior is robust to reasonable variation in the choice of prior, since this choice is made by the modeler and is often somewhat subjective”

the phrase “this choice is made by the modeler” referring to the choice of prior implies pretty strongly that the author understands that choice of the likelihood *isn’t* made by the modeler.

That is a seriously incorrect understanding.

Every part of the Bayesian model is chosen by the modeler to express the modeler’s understanding of what’s going on. The likelihood is a choice, whether you leave that choice to a default or you actually consciously think about it. There *is no* objectively “correct” likelihood (unless you’re talking about a simulated data situation with a random number generator).

The very first sentence in the paper “Bayesian robustness studies how changes to the model (i.e., the prior and likelihood) and to the data affect the posterior.” suggests that they share your concern about the sensitivity to the likelihood choice.

“In this paper, we focus on quantifying the sensitivity of posterior means to perturbations of the prior”. I don’t think this indicates seriously incorrect understanding on their part.

Maybe so, this might be a case of misreading by those like Andrew and myself who see that sentence as damning the prior while giving the likelihood a pass. Even if these authors understand the issue with the likelihood, it does seem to be always about the prior when people get into it regarding the validity of Bayesian inference. This despite the fact that it’s usually the Likelihood that needs more careful study. For example the issues with stuff like that regression on gun control with 50 data points and 25 predictors or the Himmicanes analysis or the broken high-order polynomial regression on chinese coal burning or whatever. They have the same likelihood problems whether you add on a prior or you use an implictly flat one that the “classical” analysis did. A Bayesian analysis using a carefully thought out prior wouldn’t get you any closer to answering those questions, but a Bayesian analysis using physical, structural, causally motivated likelihoods would get you a lot closer to reality.

Rasmus, Carlos:

The authors write: “One hopes that the posterior is robust to reasonable variation in the choice of prior, since this choice is made by the modeler and is often somewhat subjective.” My point is that the choice of data model is also important and also subjective. I do not think it makes sense to single out the “prior” part of the model for robustness checking. This is a mistake I’ve seen a lot in the Bayesian literature, to see the prior as something to be concerned about and to accept the data model without question.

So the thing is, for a given choice of data model one can easily check if there exists some parameter value such that the observed data fall into the typical set for that model. This is pretty much what Aris-Spanos-style alternative-free model-misspecification significance tests aim to do (or at least, that’s how I understand them). Heck, just plotting the empirical data distribution and the model fit together and comparing them visually qualifies as this sort of check; good texts on statistical models fitting always recommend doing something of this kind.

If there’s any equivalent way to check the prior, I’m not aware of it. My understanding of the situation is that people think that we already have good tools for checking the data model but not so the prior, and thus they treat model-checking as a given and focus on the unsolved problem.

Corey:

> think that we already have good tools for checking the data model but not so the prior

That was my view but Mike Evans raised enough doubt about that for me to no longer be so sure.

But I have fallen behind on reading his work – Checking for prior-data conflict using prior to posterior divergences https://arxiv.org/pdf/1611.00113v3.pdf

I think prior-data conflict is related to but not identical to robustness of inferences to the choice of prior.

Agree – and to me prior-data conflict is actually more important.

That’s because you’re a latter day Peirce.

Corey:

There are lots of checks for models; I’ve written several papers on the topic too! But for the models that I’ve worked on, “plotting the empirical data distribution” doesn’t do much: my data are typically binary!

One word: normal approximation.

Weak attenpt at humor.

only a linguist calls “normal approximation” one word ;-)

The core insight isn’t that we should plot empirical distributions, it’s that we can calculate p(Data | Param) for various values of Param and see if any of them make the Data nontrivially probable, but we can’t calculate p(Param | Data) without choosing a prior for Param. The situations aren’t symmetric with respect to the use of a prior.

Thank you for taking a look at our paper! We really couldn’t agree more with your comment, and will gladly accept your friendly edit into future work. For the moment, we’ve focused on the prior for simplicity of exposition and to tie more clearly with prior work (pun intended), though the same local sensitivity ideas extend very readily to alternative likelihood specifications or to the detection of influential data points. (In fact, in our very first workshop paper on this subject, “Covariance Matrices for Mean Field Variational Bayes”, we briefly discuss data sensitivity in Gaussian mixture models.) We look forward to expanding on these ideas in future work.

this quote from leamer’s “let’s take the con out of econometrics” is spot on:

“The difference between a fact and an opinion for purposes of decision making and inference is that when I use opinions, I get uncomfortable. I

am not too uncomfortable with the opinion that error terms are normally distributed because most econometricians make use of that assumption.

This observation has deluded me into thinking that the opinion that error terms are normal may be a fact, when I know deep inside that normal

distributions are actually used only for convenience. In contrast, I am quite uncomfortable using a prior distribution, mostly I suspect because hardly anyone uses them. If convenient prior distributions were used as often as convenient sampling distributions, I suspect that I could be as

easily deluded into thinking that prior distributions are facts as I have been into thinking that sampling distributions are facts.”