Prior distributions for regression coefficients

Posted on September 7, 2012 9:11 AM by Andrew

Eric Brown writes:

I have come across a number of recommendations over the years about best practices for multilevel regression modeling. For example, the use of t-distributed priors for coefficients in logistic regression and standardizing input variables from one of your 2008 Annals of Applied Statistics papers; or recommendations for priors on variance parameters from your 2006 Bayesian Analysis paper. I understand that these are often of varied opinion of people in the field, but I was wondering if you have a reference that you point people to for a place to get started? I’ve tried looking through your blog posts but couldn’t find any summaries.

For example, what are some examples of when I should use more than a two-level hierarchical model? Can I use a spike-slab coefficient model with a t-distributed prior for the slab rather than a normal? If I assume that my model is a priori wrong (but still useful), what are some recommended ways to choose how many interactions to use in the model? Finally, how would you recommend handling correlated / collinear variables (such as daily average temperature and daily average morning temperature)?

As you can probably tell, this isn’t my area of expertise (I’m an ophthalmologist with a PhD in biochemistry) but I like to be able to justify my decisions on how to analyze data — “because its easy” or “because thats how others do it” isn’t enough. I’m quite comfortable with programming, R, and BUGS/JAGS so that shouldn’t get in the way.

My reply:

Unfortunately, I don’t really have any convenient set of recommendations. The biggest issue with regression is not the priors (once you do the basic step of assigning some weakly informative prior to deal with separation and near-separation) but what variables to include in the model in the first place. That said, the priors you use can influence what variables you feel comfortable including in the model. I’ve long thought that a lot of the fuss about variable selection arose because of the traditional insistence on flat priors within a model. Thus, rather than spike-and-slab, I prefer a prior that is not so restrictive at 0 (that is, spreading the “spike”) and also does partial pooling away from zero (that is, replacing the “slab” with a proper density). In that sense the t prior is a replacement for the entire spike-and-slab, not just for the slab.

Then we should think about interactions. It seems to make sense to have a prior that partially pools interactions more strongly toward zero if the main effects are small. We’ve played around with such models for awhile but with no particularly clean results. One of our motivations for building Stan was in fact to enable us to experiment more systematically with such models.

Finally, parameterization is key. If we are going to assign independent or nearly independent prior distributions on the coefficients, we should put some effort into transforming so that independence makes some sense.

1 thought on “Prior distributions for regression coefficients”

Peter Carbonetto on September 7, 2012 5:47 PM at 5:47 pm said:

People in statistical genetics, myself included, have been thinking about these issues quite a bit because the choice of prior can influence what genes (or, more generally, genetic loci) you decide are associated with a disease, or complex trait.

For example, some people argue that most of the effects on a complex trait will be very small, and a few are large. In this case, the most appropriate prior would be a spike-and-slab in which both the spike and slab are normal densities, and the slab is much more spread out, thereby allowing for larger coefficients (this was an approach we recently advocated; see http://arxiv.org/abs/1209.1341).

There are, of course, many alternatives, and there is no single best approach. One alternative is to have a “scale mixture” (e.g. http://dx.doi.org/10.1371/journal.pgen.1000130) that is very spread out so that it allows for the possibility of large regression coefficients, but most of the mass is concentrated near zero. Based on my own experience on working in problems in genomics, I find that it is more natural to specify priors using the spike-and-slab.

Peter

Comments are closed.