For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates the uncertainty in model fitting, but that’s typically the least of our worries.
Sometimes, though, lmer/glmer estimates group-level variances at 0 or estimates group-level correlation parameters at +/- 1. Typically, when this happens, it’s not that we’re so sure the variance is close to zero or that the correlation is close to 1 or -1; rather, the marginal likelihood does not provide a lot of information about these parameters of the group-level error distribution.
I don’t want point estimates on the boundary. I don’t want to say that the unexplained variance in some dimension is exactly zero.
One way to handle this problem is full Bayes: slap a prior on sigma, do your Gibbs and Metropolis, and summarize posterior inferences by simulation. That’s fine but it’s another bit of programming and computation. We wanted to be able to do it using the existing lmer/glmer (or, in Stata, gllamm) framework.
Yeojin Chung, Sophia Rabe-Hesketh, Jingchen Liu, Vince Dorie, and I came up with a solution. We compute a maximum penalized marginal likelihood estimate, the marginal posterior mode under a gamma (not inverse gamma; you know how I hate that) prior for the scale parameter. Our default choice is gamma (2, 1/A), where A is set to some large value such as 10 (rescaled relative to the regression predictors, as usual). The shape parameter of 2 gives a prior that is linear at 0: thus, the posterior mode can’t be 0 under any circumstances–but it can be as close to 0 as is required based on the likelihood. Some basic algebra shows that if the marginal maximum likelihood estimate is at 0, our posterior mode estimate will be about 1 standard error away from 0, which seems about right. The rate parameter, 1/A, is set to be small so that the prior will only pull the estimate down if it is unexpectedly large, as might occur if the number of groups is small.
Our prior distribution is weakly informative in that it does something to the estimate but is generic and doesn’t have much information as we would have in just about any particular example. I think that in many applied examples it would make sense to use stronger prior information. But I think that, as a default choice, the weakly informative gamma (2, 1/A) prior makes more sense than the uniform distribution (corresponding to the nonregularized estimate).
In the multivariate case, we use a Wishart (not inverse-Wishart), which when we choose the right parameters keeps the variances away from 0 and the correlation matrix positive definite.
One interesting feature of these models is that the weakly-informative regularizing prior we use for a point estimate is not the same as the weakly-informative prior that we would use for full Bayes (where there is no need to keep the posterior mode away from the boundary).
Here’s the abstract to our paper:
Variance parameters in mixed or multilevel models can be difficult to estimate, espe- cially when the number of groups is small. We propose a maximum penalized likelihood approach which is equivalent to estimating variance parameters by their marginal poste- rior mode, given a weakly informative prior distribution. By choosing the prior from the gamma family with at least 1 degree of freedom, we ensure that the prior density is zero at the boundary and thus the marginal posterior mode of the group-level variance will be positive. The use of a weakly informative prior allows us to stabilize our estimates while remaining faithful to the data.
Let me again thank my collaborators, Yeojin Chung, Sophia Rabe-Hesketh, Jingchen Liu, Vince Dorie. We’re now working on various generalizations, including multivariate distributions, generalized linear models, and hierarchical models for the variance parameters (going beyond the simple ideas in section 6 of my article from a few years ago).