Skip to content

Multilevel modeling: What it can and cannot do

Today’s post reminded me of this article from 2005:

We illustrate the strengths and limitations of multilevel modeling through an example of the prediction of home radon levels in U.S. counties. . . .

Compared with the two classical estimates (no pooling and complete pooling), the inferences from the multilevel models are more reasonable. . . . Although the specific assumptions of model (1) could be questioned or improved, it would be difficult to argue against the use of multilevel modeling for the purpose of estimating radon levels within counties. . . . Perhaps the clearest advantage of multilevel models comes in prediction. In our example we can predict the radon levels for new houses in an existing county or a new county. . . . We can use cross-validation to formally demonstrate the benefits of multilevel modeling. . . . The multilevel model gives more accurate predictions than the no-pooling and complete-pooling regressions, especially when predicting group averages.

The most interesting part comes near the end of the three-page article:

We now consider our model as an observational study of the effect of basements on home radon levels. The study includes houses with and without basements throughout Minnesota. The proportion of homes with basements varies by county (see Fig. 1), but a regression model should address that lack of balance by estimating county and basement effects separately. . . . The new group-level coefficient γ2 is estimated at −.39 (with standard error .20), implying that, all other things being equal, counties with more basements tend to have lower baseline radon levels. For the radon problem, the county-level basement proportion is difficult to interpret directly as a predictor, and we consider it a proxy for underlying variables (e.g., the type of soil prevalent in the county).

This should serve as a warning:

In other settings, especially in social science, individual av- erages used as group-level predictors are often interpreted as “contextual effects.” For example, the presence of more basements in a county would somehow have a radon-lowering effect. This makes no sense here, but it serves as a warning that, with identical data of a social nature (e.g., consider substituting “income” for “radon level” and “ethnic minority” for “basement” in our study), it would be easy to leap to a misleading conclusion and find contextual effects where none necessarily exist. . . .

This is related to the problem in meta-analysis that between-study variation is typically observational even if individual studies are randomized experiments . . .

In summary:

One intriguing feature of multilevel models is their ability to separately estimate the predictive effects of an individual predictor and its group-level mean, which are sometimes interpreted as “direct” and “contextual” effects of the predictor. As we have illustrated in this article, these effects cannot necessarily be interpreted causally for observational data, even if these data are a random sample from the population of interest. Our analysis arose in a real research problem (Price et al. 1996) and is not a “trick” example. The houses in the study were sampled at random from Minnesota counties, and there were no problems of selection bias.

Read the whole thing.


  1. Kyle C says:

    Your 2005 article is a model of clarity and concision, which I imagine took a great deal of effort.

    I’m curious, did you originally have a second author? The article refers to “authors” and “we” throughout.

    • The Dude says:

      The royal we! The editorial…

    • Andrew says:


      My original plan was to write an article jointly with Jan de Leeuw. Both of us had written on multilevel models but he’d written things with a more skeptical orientation so I thought it could be interesting to hash out our areas of agreement and write an article with some title such as “Multilevel models: What they can and can’t do.” My rough idea was to outline the strong benefits of mlm’s for certain prediction problems, as well as for problems such as longitudinal data in biostat where there are clearly “person effects,” and then to also outline the ways in which mlms do not resolve fundamental causal inference problems from observational data, and skepticism about the roles of statistical significance and standard errors. I thought that if we mad our differing perspectives clear, it could perhaps be useful for people to see the general areas of agreement. Jan expressed interest in participating in this project, but then when I finally wrote the article, he said it was fine as is, so I published it!

  2. Clyde Schechter says:

    Sorry for raising a tangential question, but how is it possible that there was no selection bias in the Minnesota data? The measurements must have required entering the premises, and it is difficult for me to imagine that not a single owner refused access in the entire state, even in Minnesota.

    • Andrew says:


      The survey was done by mail. I assume there was some missing data but it’s not something we spent much time worrying about. In any case, the causal issues would’ve arisen even had the response rate been 100%.

Leave a Reply