A new R package for fititng multilevel models

Joscha Legewie points to this article by Lars Ronnegard, Xia Shen, and Moudud Alam, “hglm: A Package for Fitting Hierarchical Generalized Linear Models,” which just appeared in the R journal. This new package has the advantage, compared to lmer(), of allowing non-normal distributions for the varying coefficients. On the downside, they seem to have reverted to the ugly lme-style syntax (for example, “fixed = y ~ week, random = ~ 1|ID” rather than “y ~ week + (1|D)”). The old-style syntax has difficulties handling non-nested grouping factors. They also say they can estimated models with correlated random effects, but isn’t that just the same as varying-intercept, varying-slope models, which lmer (or Stata alternatives such as gllam) can already do? There’s also a bunch of stuff on H-likelihood theory, which seems pretty pointless to me (although probably it won’t do much harm either).

In any case, this package might be useful to some of you, hence this note.

7 thoughts on “A new R package for fititng multilevel models

  1. Hello Mr. Gelman,
    just saw your blog, and this article.
    A little note: with "correlated" random effects the authors of the article did mean a priori correlation between the "groups", e.g. animal model: correlated random intercepts (via manipulation of the design matrix). lmer/glmer estimates the correlation between a random slope and a random intercept but assumes independent groups (diagonal incidence matrix as design matrix) …
    Nice blog !
    Andi from Germany

  2. Meng's "Decoding the H-likelihood" should be required reading before using the package. As I understand it, hglm methods are integrated nested laplace approximation methods without consideration of whether that was a good idea or not. In many cases INLA (and hglm) can work, but it's not a general procedure and knowing when it won't work is tricky.

  3. C Ryan: Thanks, yes the H may well stand for Hallucianated.

    Or maybe I just did not understand a series of email and telephone converastions with John in which I tried to get him to clarify.

    Have yet to read Meng's paper carefully.

    On the other hand, it might give nice starting values for other methods.

    K?

  4. Yes, a paper by Harry Joe [1] suggests that they are very good to get close, as long as you aren't on the boundary of the VC parameters. Also, if the hglm package implements the second-order laplace approximation for general designs [2] that would be great; the other glmm packages should steal that right away.

    1. Joe, H. Accuracy of Laplace approximation for discrete response mixed models. Computational Statistics & Data Analysis 52, 5066-5074 (2008).

    2. Noh, M. & Lee, Y. REML estimation for binary data in GLMMs. Journal of Multivariate Analysis 98, 896-915 (2007).

  5. Thanks Andrew. Just saw this and thought it is immensely useful to the work I am doing now. Also, a quick note to say thanks for your work on this blog.
    Miguel

  6. We would like to thank Andrew Gelman for commenting on our recent paper in The R Journal.

    Concerning the syntax, there are two alternatives for the input syntax in hglm. One is the lme-style and the other is a design matrix style. The latter gives a possibility to have an a priori correlation between groups/clusters (as Andi from Germany points out). These kind of correlation structures appear frequently in genetics and there is a need to allow for such models in R.

    We summarize previous discussions on h-likelihood theory in Section 10 (Discussion on h-likelihood theory) of our vignette available with the hglm package. This should be a good starting point for those interested in the theory. In the vignette we also develop further how the hglm package uses h-likelihood theory. Basically the package implements a set of inter-connected GLMs, which gives good approximations to the maximum h-likelihood estimates.

    The R package HGLMMM developed by Marek Molas at Erasmus MC, Rotterdam, explicitly maximizes the h-likelihood and gives higher-order corrections, to the expense of being slightly slower.

    We appreciate suggestions for further development of the hglm package.

    Lars Ronnegard

  7. I would like to offer some comments about syntax.

    Parenthesis matching in R is a big nuisance and is often the source of errors. A simpler syntax that uses fewer parentheses will reduce the number of coding errors. (Smart editors help, but do not eliminate this problem.)

    R already has way too many characters that mean special things–this makes it hard for new users to read code. Every single special character on the U.S. keyboards is used by R in some fashion: ~ ` ! @ # $ % ^ & * ( ) { } [ ] : ; | / etc. After running out of single symbols, there's things like [[ %% %*% .( :: :::

    Using () to denote random terms continues the infamous tradition of coming up with yet new uses of symbols, making the code ever harder to read. Look at the following two lines

    MCMCglmm(logDens ~ sample*dilute, random= ~ sample+dilute, data=Assay)

    lmer(logDens ~ sample*dilute + (1|sample) + (1|dilute), data=Assay)

    For a new user (such as a person coming from PROC MIXED), which syntax is easier to understand? Which syntax has fewer parentheses to haggle with? Which syntax avoids the wretched implied intercept that even Bill Venables said is a difficulty for users?

    The MCMCglmm syntax is wonderfully clean to type AND to understand.

Comments are closed.