Nathan Lemoine writes:

I’m an ecologist, and I typically work with small sample sizes from field experiments, which have highly variable data. I analyze almost all of my data now using hierarchical models, but I’ve been wondering about my interpretation of the posterior distributions. I’ve read your blog, several of your papers (Gelman and Weakliem, Gelman and Carlin), and your excellent BDA book, and I was wondering if I could ask your advice/opinion on my interpretation of posterior probabilities.

I’ve thought of 95% posterior credible intervals as a good way to estimate effect size, but I still see many researchers use them in something akin to null hypothesis testing: “The 95% interval included zero, and therefore the pattern was not significant”. I tend not to do that. Since I work with small sample sizes and variable data, it seems as though I’m unlikely to find a “significant effect” unless I’m vastly overestimating the true effect size (Type M error) or unless the true effect size is enormous (a rarity). More often than not, I find ‘suggestive’, but not ‘significant’ effects.

In such cases, I calculate one-tailed posterior probabilities that the effect is positive (or negative) and report that along with estimates of the effect size. For example, I might say something like

“Foliar damage tended to be slightly higher in ‘Ambient’ treatments, although the difference between treatments was small and variable (Pr(Ambient>Warmed) = 0.86, CI95 = 2.3% less – 6.9% more damage).”

By giving the probability of an effect as well as an estimate of the effect size, I find this to be more informative than simply saying ‘not significant’. This allows researchers to make their own judgements on importance, rather than defining importance for them by p < 0.05. I know that such one-tailed probabilities can be inaccurate when using flat priors, but I place weakly informative priors ( N(0,1) or N(0,2) ) on all parameters in an attempt to avoid such overestimates unless strongly supported by my small sample sizes. I was wondering if you agree with this philosophy of data reporting and interpretation, or if I’m misusing the posterior probabilities. I’ve done some research on this, but I can’t find anyone that’s offered a solid opinion on this. Based on my reading and the few interactions I’ve had with others, it seems that the strength of posterior probabilities compared to p-values is that they allow for such fluid interpretation (what’s the probability the effect is positive? what’s the probability the effect > 5? etc.), whereas p-values simply tell you “if the null hypothesis is true, theres a 70 or 80% chance I could observe an effect as strong as mine by chance alone”. I prefer to give the probability of an effect bounded by the CI of the effect to give the most transparent interpretation possible.

My reply:

My short answer is that this is addressed in this post:

*If* you believe your prior, then yes, it makes sense to report posterior probabilities as you do. Typically, though, we use flat priors even though we have pretty strong knowledge that parameters are close to 0 (this is consistent with the fact that we see lots of estimates that are 1 or 2 se’s from 0, but very few that are 4 or 6 se’s from 0). So, really, if you want to make such a statement I think you’d want a more informative prior that shrinks to 0. If, for whatever reason, you *don’t* want to assign such a prior, then you have to be a bit more careful about interpreting those posterior probabilities.

In your case, you’re using weakly-informative priors such as N(0,1), this is less of a concern. Ultimately I guess the way to go is to embed any problem in a hierarchical meta-analysis so that the prior makes sense in the context of the problem. But, yeah, I’ve been using N(0,1) a lot myself lately.

Regarding the comment: “If you believe your prior, then yes, it makes sense to report posterior probabilities as you do.”

I agree but think it is good to stress that there is an additional assumption, namely the likelihood function (as has been pointed out on this blog and elsewhere). More generally I would say, “if you believe your joint distribution of (y_obs,theta), it makes sense to report posterior probabilities as you do.” since the joint distribution of (y_obs,theta) is defined once both the likelihood and prior functions are defined.

But of course the real question is, “Under what criteria am I to base whether or not I believe my joint distribution of (y_obs, \theta)?” to which I don’t think there is a satisfactory answer. The first step might be to deal with only the potentially observable portion of this joint distribution, namely the integral of p(y_obs, \theta) over theta, which leaves us with a distribution over potentially observable data. From there, we need some measure of consistency of the distribution of p(y_obs) with y_obs, which might lead us back to a “hypothesis test” of some sort. There is also the issue that any joint distribution of p(y_obs, \theta) that maps to the same p(y_obs) would be treated equally under such an approach, because the marginalization operator is a many to one mapping; it is not invertible. Fundamentally there cannot be information in y_obs to distinguish between multiple likelihood/prior pairs that map to the same marginal distribution over y_obs.

It seems misguided to begin with the supposition that there are two main camps of statistical theory. At a (very) high level, all of statistics uses probabilistic models to make sense of data, which may include deriving inferences (under certain probabilistic assumptions) or checking the quality of such inferences (under certain probabilistic assumptions). E.g.: You might think that it is ok to put a probability model on a parameter, or you might not. You might think that observable randomness comes from the specified likelihood, or you might think that observable randomness comes from the marginal distribution of the observed data. It would seem more productive to me to fixate on these core issues as opposed to picking sides for Bayesian or frequentist; to some extent the field might be be impeded by too much attention paid to the “debate” between both camps.

Whether you are a Bayesian or frequentist or whatever else, the question of “How do we know this probability model is any good?” is a question I am not satisfied that there exists a great answer to.

This doesn’t exactly address the question but I’d probably recommend to do some sensitivity analyses with priors that are different but could equally well be interpreted as “about matching a (probably rather weak) description of what is known a priori”. This would give more of a feeling for how much variation there actually is in these posterior probabilities that is due to the specification of the prior. If there is much variation, one may wonder whether one wants to be Bayesian in this situation…