## R-squared for multilevel models

Fred Schiff writes:

I’m writing to you to ask about the “R-squared” approximation procedure you suggest in your 2004 book with Dr. Hill. [See also this paper with Pardoe---ed.]

I’m a media sociologist at the University of Houston. I’ve been using HLM3 for about two years.

Briefly about my data. It’s a content analysis of news stories with a continuous scale dependent variable, story prominence. I have 6090 news stories, 114 newspapers, and 59 newspaper group owners. All the Level-1, Level-2 and dependent variables have been standardized. Since the means were zero anyway, we left the variables uncentered. All the Level-3 ownership groups and characteristics are dichotomous scales that were left uncentered.

PROBLEM: The single most important result I am looking for is to compare the strength of nine competing Level-1 variables in their ability to predict and explain the outcome variable, story prominence. We are trying to use the residuals to calculate a “R-squared” measure for each level as you and Hill proposed. We haven’t been able to generate OLS regression equations for each newspaper and ownership group in HLM because the manual suggests “optional settings” that are not available in our software (HLM 6.06).
QUESTION-1 – How could we generate the estimated Bayesian residuals for level-1?

QUESTION-2 – Is it legitimate to run a model where Level-1 and Level-2 variables are standardized and Level-3 variables are dichotomous dummy variables?

QUESTION-3 – Is it legitimate to run models to estimate parameters for each ownership group and at the same time include the corresponding dummy variables as part of the data structure?

QUESTION-4 – In equations that include Level-3 variables, is it valid to describe the results as applying selectively to the stories (L1) in newspapers (L2) owned by one ownership group (L3, coded 1) as opposed to stories in newspapers of other ownership groups (L3, coded 0)?

My reply:

1. I don’t know the HLM software so I don’t know how to use it to compute the Bayesian residuals. But you might be happy to hear that we are currently working on implementing these ideas using the lmer/glmer software in R. Once it’s been programmed in one package, it shouldn’t be hard for people to translate it into another.

2. Yes, this is fine. When in doubt, interpret coefficients by considering predictions with inputs set to various reasonable fixed values.

3. I don’t quite understand this question. If you have all the data loaded in, you should be able to use ownership group as a level and also include predictors at that level.

4. I think this is reasonable but I’m not following all the details. Again, when in doubt, it’s always a good idea to understand your model through comparisons of specific predictions. That’s one trick we use in our book on occasion.

### 3 Comments

1. David Shor says:

“But you might be happy to hear that we are currently working on implementing these ideas using the lmer/glmer software in R”

This makes me sad because it means you’re not working on developing STAN.

• Andrew says:

David:

In that case you’ll be happy to hear that our article on regularized point estimates for variance parameters got rejected again. It’s one of my favorites of my papers. But that’s how it often is—the work that’s most innovative (both in how it seems to me, and in retrospect given its impact on the field)—often gets heavy resistance.

• Andrew says:

Also, inside or outside of Stan, we still need to develop tools for exploratory model analysis. If we implement these ideas successfully in blmer/bglmer, it shouldn’t be hard to port them to Stan.