Skip to content

Fitting hierarchical GLMs in package X is like driving car Y

Given that Andrew started the Gremlin theme, I thought it would only be fitting to link to the following amusing blog post:

It’s exactly what it says on the tin. I won’t spoil the punchline, but will tell you the packages considered are: lme4, JAGS, RStan(Arm), and INLA.

What do you think?

Anyway, don’t take my word for it—read the original post. I’m curious about others’ take on systems for fitting GLMs and how they compare (to cars or otherwise).

You might also like…

My favorite automative analogy was made in the following essay, from way back in the first dot-com boom:

Although it’s about operating systems, the satirical take on closed- vs. open-source is universal.

(Some of) what I thought

Chris Brown reports in the post,

I simulated a simple hierarchical data-set to test each of the models. The script is available here. The data-set has 100 binary measurements. There is one fixed covariate (continuous) and one random effect with five groups. The linear predictor was transformed to binomial probabilities using the logit function. For the Bayesian approaches, slightly different priors were used for each package, depending on what was available. See the script for more details on priors.

Apples and oranges. This doesn’t make a whole lot of sense, given that lme4 is giving you max marginal likelihood, whereas JAGS and Stan give you full Bayes. And if you use different priors in Stan and JAGS, you’re not even fitting the same posterior. I’m afraid I’ve never understood INLA (lucky for me Dan Simpson’s visiting us this week, so there’s no time like the present to learn it). You’ll also find that relative performance of Stan and JAGS will vary dramatically based on the shape of the posterior and scale of the data.

It’s all about effective sample size. The author doesn’ tmention the subtlety of choosing a way to estimate effective sample size (RStan’s is more conservative than the Coda package, using a variance approach like that of the split R-hat we use to detect convergence problems in RStan).

Random processes are hard to compare. You’ll find a lot of variation across runs with different random inits. You really want to start JAGS and Stan at the same initial points and run to the same effective sample size over multiple runs and compare averages and variation.

RStanArm, not RStan. I looked at the script, and it turns out the post is comparing RStanArm, not coding a model in Stan itself and running it in RStan. Here’s the code.

t_prior <- student_t(df = 4, location = 0, scale = 2.5)
mb.stanarm <- microbenchmark(mod.stanarm <- stan_glmer(y ~ x + (1|grps),
                                                       data = dat,
                                                       family = binomial(link = 'logit'),
                                                       prior = t_prior,
                                                       prior_intercept = t_prior,
                                                       chains = 3, cores = 1, seed = 10),
                             times = 1L)

Parallelization reduces wall time. This script runs RStanArm three Markov chains on a single core, meaning they have to run one after the other. This can obviously be sped up by the up to the number of cores you have and letting them all run at the same time. Presuambly JAGS could be sped up the same way. The multiple chains are embarassingly parallelizable, after all.

It's hard to be fair! There's a reason we don't do a lot of these comparisons ourselves!


  1. shravan says:

    owning a car feels like a quaint idea. I use rstan for glmm’s by the way.

  2. Andrew says:


    The “which car do you drive” thing is amusing but I find the comparisons in that linked post to be misleading in several ways. First, it appears that they are computing speed using iterations per second, rather than effective sample size per second. Second, as you said, it’s not clear if they’re using rstan or rstanarm. If they’re using rstanarm, I don’t see how they can say this is hard to use, as the commands look just like lme4. Third, as you said, they should run 4 chains in parallel as this is the default with rstan or rstanarm.

  3. awinter says:

    I am using rstanarm and INLA for our current glmm’s. Though I am eyeballing pystan for this.

    The last modeling I did a few weeks ago I was asked to use lme4, MCMCglmm, rstanarm, and INLA on the same data set.

  4. Björn says:

    Where’s PROC MCMC (perhaps a Trabant?) on their list? Also, the it would indeed logical to distinguish plain Stan from brms or rstanarm package – I actually like these convenient R packages (brms, rstanarm) a lot for fitting standard models (they seem like more normal everyday cars to me, let’s call them the Toyota Camry and the Honda Accord), but for some of my work I cannot use them and have to code up my own Stan model (more like a highly customisable car that the driver can change around as needed).

Leave a Reply