Cesare Aloisi writes:

I am writing you regarding something I recently stumbled upon in your book Data Analysis Using Regression and Multilevel/Hierarchical Models which confused me, in hopes you could help me understand it. This book has been my reference guide for many years now, and I am extremely grateful for everything I learnt from you.

On page 261, a 95% confidence interval for the intercept in a certain group (County 26) is calculated using only the standard error of the “random effect” (the county-level error). The string is as follows:

coef(M1)$county[26,1] + c(-2,2)*se.ranef(M1)$county[26]

My understanding is that, since the group-level prediction (call it y.hat_j = coef(M1)$county[26,1]) is a linear combination of a global average and a group-level deviation from the average (y.hat_j = beta_0 + eta_j), then the variance of y.hat_j should be the sum of the covariances of beta_0 and eta_j, not just the variance of eta_j, as the code on page 261 seems to imply. In other words:

Var(y.hat_j) = Var(beta_0) + Var(eta_j) + 2Cov(beta_0, eta_j)

Admittedly, lme4 does not provide an estimate for the last term, the covariance between “fixed” and “random” effects. Was the code used in the book to simplify the calculations, or was there some deeper reason to it that I failed to grasp?

My reply: The short answer is that it’s difficult to get this correct in lmer but very easy when using stan_lmer() in the rstanarm package. That’s what I recommend, and that’s what we’ll be doing in the 2nd edition of our book.

What are your plans for the second edition? Can we expect it next year?

I stumbled upon the exact same problem a while ago, scratched my head for a while, and decided on using a complicated double bootstrap method (described in Chatterjee S, Lahiri P. A simple computational method for estimating mean squared prediction error in general small-area model. In: Proceedings of the section on survey research methods. 2007: 3486–3493.)

Nowadays I would use stan without a moment of reflexion and thank the lord for its existence.

I find it mildly amusing but not at all surprising that the article calls the procedure “simple” while the person implementing it calls it “complicated”.

Article names can be insulting!

Same here. However, I used INLA back then, it has an easy way to compute the posterior marginals of linear combinations. It is sad that one is not able to do this with lme4.

I’ve been trying to following the same route for simple multilevel models with lme4 as laid out in the book BDA3 and the paper Why We (Usually) Don’t Have to Worry About Multiple Comparisons. The main reason I would like to adopt this approach is due to the heavy (sometimes even prohibitive) computational burden for large datasets. So, the formulations for the posterior mean and variance with the approximate Bayesian approach are not so accurate?

I didn’t understand this question. How can there be a non-zero covariance between the estimator \hat\beta0 and etaj? Surely the code shown in the next line on that page of the book is what Cesare wanted:

as.matrix(ranef(M1)$county)[26] + c(-2,2)*se.ranef(M1)$county[26]

Those two random variables are independent, surely?

Obviously, it’s 2017 so Stan is the way to go, but still.

Sounds pretty interesting. Do you know when we can expect 2nd edition?

One problem with using Stan in a future version of Gelman & Hill (I suppose that’s what you are referring to?) is that fitting Stan models analogous to those in the current book takes considerably more time. This would slow down learning and discourage students from conducting their own experiments. Fitting models in class would be pretty much impossible. I would prefer an lmer-based book with chapters on Stan in the second half. I think this would work okay because a lot of the material in the early chapters is largely orthogonal to the Bayesian vs. frequentist issue anyway (coding of predictors, transforms of the DVs, etc.).

(This was supposed to be a reply to Tim. Not sure why it’s showing up as such.)

To contextualise, the question stemmed from a practical case I was working on. I fit a 3-level growth model of pupil scores taken at different time points, nested in pupils, themselves nested in classes. A quadratic equation provided the best fit. Someone asked me to compare the trajectories of two classes at each time point. This raised the question on how to calculate group-level confidence intervals.

If you think about the covariance matrix of a group-level estimate at a certain time point, it is not just about the covariance between one single “fixed effect” and its “random component”, which are assumed to be orthogonal. It is more complicated than this, and I was not sure at all that I could just dismiss all covariances by treating them as equal to zero. Therefore, I checked Andrew’s book to see if I could find a solution. The situation he presents has fewer terms but it is conceptually similar, so I thought I would ask for clarifications on the matter.

AFAIK, neither lme4 or merTools provide a direct implementation of this, and Andrew confirmed it is a bit of a can of worms. Looking forward the next edition of the book!

Hi Cesare,

I am not totally clear on your specific model, but to some extent the appropriate

confidence interval (CI) construction depends on how you want to interpret

the CI. If you are looking for posterior coverage, then using the posterior

estimate plus or minus two standard errors is reasonable (assuming the

posterior distributions are approximately normal)

Such procedures have roughly 95% frequentist coverage

*on average across groups*, but do not generally maintain

this coverage for each group individually. This is partly

because the posterior estimates are biased.

If you want to share information across groups, but also want to

use a procedure that maintains 95% coverage for a specific group

mean or contrast, you may be interested in the following paper

on hierarchical modeling:

https://arxiv.org/abs/1612.08287

This paper shows how to construct CIs in a hierarchical model

that both shares information across groups in a Bayesian-like

way, and maintains 95% frequentist coverage for each group

individually.

Thanks, Peter.