Getting the right uncertainties when fitting multilevel models

Posted on September 23, 2017 9:14 AM by Andrew

Cesare Aloisi writes:

I am writing you regarding something I recently stumbled upon in your book Data Analysis Using Regression and Multilevel/Hierarchical Models which confused me, in hopes you could help me understand it. This book has been my reference guide for many years now, and I am extremely grateful for everything I learnt from you.

On page 261, a 95% confidence interval for the intercept in a certain group (County 26) is calculated using only the standard error of the “random effect” (the county-level error). The string is as follows:

coef(M1)$county[26,1] + c(-2,2)*se.ranef(M1)$county[26]

My understanding is that, since the group-level prediction (call it y.hat_j = coef(M1)$county[26,1]) is a linear combination of a global average and a group-level deviation from the average (y.hat_j = beta_0 + eta_j), then the variance of y.hat_j should be the sum of the covariances of beta_0 and eta_j, not just the variance of eta_j, as the code on page 261 seems to imply. In other words:

Var(y.hat_j) = Var(beta_0) + Var(eta_j) + 2Cov(beta_0, eta_j)

Admittedly, lme4 does not provide an estimate for the last term, the covariance between “fixed” and “random” effects. Was the code used in the book to simplify the calculations, or was there some deeper reason to it that I failed to grasp?

My reply: The short answer is that it’s difficult to get this correct in lmer but very easy when using stan_lmer() in the rstanarm package. That’s what I recommend, and that’s what we’ll be doing in the 2nd edition of our book.

13 thoughts on “Getting the right uncertainties when fitting multilevel models”

Tim Mastny on September 23, 2017 10:19 AM at 10:19 am said:

What are your plans for the second edition? Can we expect it next year?

Reply ↓
JR on September 23, 2017 10:56 AM at 10:56 am said:

I stumbled upon the exact same problem a while ago, scratched my head for a while, and decided on using a complicated double bootstrap method (described in Chatterjee S, Lahiri P. A simple computational method for estimating mean squared prediction error in general small-area model. In: Proceedings of the section on survey research methods. 2007: 3486–3493.)

Nowadays I would use stan without a moment of reflexion and thank the lord for its existence.

Reply ↓
- Alex on September 25, 2017 8:16 AM at 8:16 am said:
  
  I find it mildly amusing but not at all surprising that the article calls the procedure “simple” while the person implementing it calls it “complicated”.
  
  Reply ↓
  - JR on September 25, 2017 9:26 AM at 9:26 am said:
    
    Article names can be insulting!
    
    Reply ↓
Paul on September 23, 2017 2:35 PM at 2:35 pm said:

Same here. However, I used INLA back then, it has an easy way to compute the posterior marginals of linear combinations. It is sad that one is not able to do this with lme4.

Reply ↓
bluepole on September 23, 2017 4:56 PM at 4:56 pm said:

I’ve been trying to following the same route for simple multilevel models with lme4 as laid out in the book BDA3 and the paper Why We (Usually) Don’t Have to Worry About Multiple Comparisons. The main reason I would like to adopt this approach is due to the heavy (sometimes even prohibitive) computational burden for large datasets. So, the formulations for the posterior mean and variance with the approximate Bayesian approach are not so accurate?

Reply ↓
Shravan on September 24, 2017 6:29 AM at 6:29 am said:

I didn’t understand this question. How can there be a non-zero covariance between the estimator \hat\beta0 and etaj? Surely the code shown in the next line on that page of the book is what Cesare wanted:

as.matrix(ranef(M1)$county)[26] + c(-2,2)*se.ranef(M1)$county[26]

Those two random variables are independent, surely?

Obviously, it’s 2017 so Stan is the way to go, but still.

Reply ↓
Jens on September 25, 2017 2:53 PM at 2:53 pm said:

Sounds pretty interesting. Do you know when we can expect 2nd edition?

Reply ↓
Titus von der Malsburg on September 26, 2017 4:27 AM at 4:27 am said:

One problem with using Stan in a future version of Gelman & Hill (I suppose that’s what you are referring to?) is that fitting Stan models analogous to those in the current book takes considerably more time. This would slow down learning and discourage students from conducting their own experiments. Fitting models in class would be pretty much impossible. I would prefer an lmer-based book with chapters on Stan in the second half. I think this would work okay because a lot of the material in the early chapters is largely orthogonal to the Bayesian vs. frequentist issue anyway (coding of predictors, transforms of the DVs, etc.).

Reply ↓
Titus von der Malsburg on September 26, 2017 4:27 AM at 4:27 am said:

(This was supposed to be a reply to Tim. Not sure why it’s showing up as such.)

Reply ↓
Cesare on September 26, 2017 5:18 AM at 5:18 am said:

To contextualise, the question stemmed from a practical case I was working on. I fit a 3-level growth model of pupil scores taken at different time points, nested in pupils, themselves nested in classes. A quadratic equation provided the best fit. Someone asked me to compare the trajectories of two classes at each time point. This raised the question on how to calculate group-level confidence intervals.

If you think about the covariance matrix of a group-level estimate at a certain time point, it is not just about the covariance between one single “fixed effect” and its “random component”, which are assumed to be orthogonal. It is more complicated than this, and I was not sure at all that I could just dismiss all covariances by treating them as equal to zero. Therefore, I checked Andrew’s book to see if I could find a solution. The situation he presents has fewer terms but it is conceptually similar, so I thought I would ask for clarifications on the matter.

AFAIK, neither lme4 or merTools provide a direct implementation of this, and Andrew confirmed it is a bit of a can of worms. Looking forward the next edition of the book!

Reply ↓
- Peter on September 26, 2017 11:07 AM at 11:07 am said:
  
  Hi Cesare,
  
  I am not totally clear on your specific model, but to some extent the appropriate
  confidence interval (CI) construction depends on how you want to interpret
  the CI. If you are looking for posterior coverage, then using the posterior
  estimate plus or minus two standard errors is reasonable (assuming the
  posterior distributions are approximately normal)
  
  Such procedures have roughly 95% frequentist coverage
  *on average across groups*, but do not generally maintain
  this coverage for each group individually. This is partly
  because the posterior estimates are biased.
  
  If you want to share information across groups, but also want to
  use a procedure that maintains 95% coverage for a specific group
  mean or contrast, you may be interested in the following paper
  on hierarchical modeling:
  
  https://arxiv.org/abs/1612.08287
  
  This paper shows how to construct CIs in a hierarchical model
  that both shares information across groups in a Bayesian-like
  way, and maintains 95% frequentist coverage for each group
  individually.
  
  Reply ↓
  - Cesare on September 27, 2017 5:29 AM at 5:29 am said:
    
    Thanks, Peter.
    
    Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Getting the right uncertainties when fitting multilevel models

13 thoughts on “Getting the right uncertainties when fitting multilevel models”

Leave a Reply Cancel reply