Eric Brown writes:
I have come across a number of recommendations over the years about best practices for multilevel regression modeling. For example, the use of t-distributed priors for coefficients in logistic regression and standardizing input variables from one of your 2008 Annals of Applied Statistics papers; or recommendations for priors on variance parameters from your 2006 Bayesian Analysis paper. I understand that these are often of varied opinion of people in the field, but I was wondering if you have a reference that you point people to for a place to get started? I’ve tried looking through your blog posts but couldn’t find any summaries.
For example, what are some examples of when I should use more than a two-level hierarchical model? Can I use a spike-slab coefficient model with a t-distributed prior for the slab rather than a normal? If I assume that my model is a priori wrong (but still useful), what are some recommended ways to choose how many interactions to use in the model? Finally, how would you recommend handling correlated / collinear variables (such as daily average temperature and daily average morning temperature)?
As you can probably tell, this isn’t my area of expertise (I’m an ophthalmologist with a PhD in biochemistry) but I like to be able to justify my decisions on how to analyze data — “because its easy” or “because thats how others do it” isn’t enough. I’m quite comfortable with programming, R, and BUGS/JAGS so that shouldn’t get in the way.
Unfortunately, I don’t really have any convenient set of recommendations. The biggest issue with regression is not the priors (once you do the basic step of assigning some weakly informative prior to deal with separation and near-separation) but what variables to include in the model in the first place. That said, the priors you use can influence what variables you feel comfortable including in the model. I’ve long thought that a lot of the fuss about variable selection arose because of the traditional insistence on flat priors within a model. Thus, rather than spike-and-slab, I prefer a prior that is not so restrictive at 0 (that is, spreading the “spike”) and also does partial pooling away from zero (that is, replacing the “slab” with a proper density). In that sense the t prior is a replacement for the entire spike-and-slab, not just for the slab.
Then we should think about interactions. It seems to make sense to have a prior that partially pools interactions more strongly toward zero if the main effects are small. We’ve played around with such models for awhile but with no particularly clean results. One of our motivations for building Stan was in fact to enable us to experiment more systematically with such models.
Finally, parameterization is key. If we are going to assign independent or nearly independent prior distributions on the coefficients, we should put some effort into transforming so that independence makes some sense.