Two-stage and multilevel regressions

Robert Rohrschneider writes:

I [Rohrschneider] am trying to gain an understanding of the pitfalls of multi-level analyses in my work which typically requires that I merge country data with surveys of individuals, usually in Europe. I wonder whether you could reduce my confusion about one issue of multilevel modeling in political science. It appears that the simple two-stage regression approach to multilevel data structures (i.e., a variant of which you and Jennifer Hill describe on p. 240 in your 2007 book) is gaining in popularity in political science.

I am thinking specifically of the 2005 Political Analysis volume to which you contributed a short response but where a “true advocate” of multilevel analyses, such as yourself or Tom Snijders, did not contribute a full chapter. My recent experience at a major political science journal seems to confirm this trend in political science, as two reviewers criticized the manuscript for not using the OLS-based 2-stage approach (I merged Eurobarometer data with contextual information about countries and estimated the hierarchical model with Stata and HLM 6.02—pretty straightforward stuff by now, or so I thought).

I am therefore trying to sort out whether it is indeed correct to use a multilevel program that estimates all levels simultaneously as opposed to the 2 stage approach. In the context of this issue, I read your article in Political Analyses (pp. 459-461), and am not quite sure why the article argues that the choice of method is a practical issue. For in your book, as you no doubt know, you (and J. Hill) note that the two step approach “can run into problems when…there are interactions between individual-level and group-level predictors.” (p. 240). My understanding of one source of this potential problem is that a cross-level interaction term (say between country context and individual-level coefficients) introduces the possibility that the coefficients from the lower level are not correctly estimated. Is my understanding correct?

Assuming it is, then isn’t the choice of which approach to use a statistical issue, at least in the presence of cross-level interaction terms (and probably other conditions as well)? In contrast, if it is a practical issue, are there conditions under which the 2 stage approach is inappropriate?

My reply: Usually I would’ve just said that multilevel modeling is better, but I thought that would be a silly thing to say in my Political Analysis article, given that I was commenting on a bunch of papers that used a different approach. So I thought it would be more useful to connect the two approaches. As I wrote there, “Two-level regression can be viewed as a special case of multilevel (hierarchical) modeling: we can obtain the fit-to-each-country-separately two-level results by fitting a multilevel model with the group-level variance parameter set to infinity (in which case the group-level model does no smoothing of the individual regression coefficients). It is generally preferable to estimate the group-level variance, and thus the appropriate amount of between-country smoothing, from the data.” The important issue is that the parameters should be allowed to vary by group, and if some people are happier doing that using two-stage regression, I didn’t want to be telling them not to do it. Practical concerns arise, for example, if you have different background variables measured in different countries, in which case it is theoretically possible to make a big model for everything, but you might want to avoid that effort.

But, yeah, I don’t think anyone should be telling you not to fit a multilevel model that you’ve already fit. If they say that, just throw it back in their face by telling them that their favored 2-stage regression is just a multilevel model with group-level variance set to infinity.

P.S. As a practical matter, I like the 2-stage regression as a way of building up to the multilevel model, especially for people who are new to these methods.

5 thoughts on “Two-stage and multilevel regressions

  1. As one of the authors in the Political Analysis issue and part-time advocate of the two-stage approach, let me share a few thoughts. As Andy notes, the two-stage approach is a method of estimating a multilevel model, not a different model. That said, is there any reason you should try the two-stage estimator in this particular situation? (i.e. small number of groups, large number of observations within groups)

    It turns out that MLE estimators such as those implemented in Stata, HLM and R (lmer) frequently run into numerical difficulties when estimating these models, particularly when you allow many individual level coefficients to vary across groups. Primo et al (1), for example, recount their struggles in estimating a model of turnout using Current Population Survey (CPS) data. Frustrated by the software (in their case, the xtmixed and gllamm procedures in Stata), Primo et al ended up using cluster standard errors. This was probably a bad idea, since pooled estimation with cluster standard errors is more sensitive to deviations from the random effects assumptions, in particular to correlation between individual level covariates and group level errors. (This is less of a problem for random effects estimators because the number of observations within groups is large.) Another advantage of the two-stage approach is that one need not assume a particular distribution of the group levels errors (Normal or otherwise).

    So, at the very least the two-stage approach can be used as check on the "one-stage" multilevel model. For example, you will be able to check: which restrictions (if any) on the individual level coefficients make sense? are the group level errors approximately normal? Any outliers? If the second-stage coefficients are very different from those of the single-stage approach, it is likely an indication of problems that would go otherwise unnoticed.

    Run-of-the-mill multilevel software were designed and tested for situations where the number of groups is large and the number of observations within group small. The terrain we are dealing with here is much different and we shouldn't expect them to work unmodified. Since the two-stage approach do not suffer from the same problems, I think your reviewers are justified in being skeptical about your results using HLM.

    (1) Primo, David M., Matthew L. Jacobsmeier and Jeffrey Milyo. 2007. “Estimating the Impact of State Policies and Institutions with Mixed-Level Data.” State Politics and Policy Quarterly 7(4):446–459.

  2. Robert,

    Sorry to hear about your reviews–it sounds like your reviewers knew just enough to be dangerous. One would have hoped that they'd read Nathaniel Beck's contribution to that PA issue. After pointing out the advantages of HLM over two-stage, he notes (at 458):

    "Thus it may well be that the articles in this issue, while important, are important for a fairly short period of time, that is, the hopefully short interval between when comparativists became methodologically sophisticated enough to be interested in multilevel models and when they became only slightly more sophisticated so that final estimation would be done in one step.''

    We can look forward to the day.

    Good luck!

  3. I think e.leoni’s comment is useful for furthering the discussion, but misses the point of Rohrschneider’s posting, to some degree. Anyone has a right to be skeptical of any method, but, according to Rohrschneider, the reviewers of his ms. appeared misguided in their belief that the two-step approach is the preferred multilevel method, as both a starting and ending point in the analysis. I agree with e.leoni that when doing multilevel analysis one shouldn’t be focused on the meta-parameters of HLM, and that the two-step method is often a good starting point for multilevel analysis, for the reasons e.leoni mentioned (insight into outliers, functional form of parameters, etc.). But using the two-step approach as an endpoint in the analysis is questionable. How often does the group variance approach infinity (Gelman's reply)?

    Here’s a useful cautionary tale. An individual in my department submitted a manuscript to the Review using the two-step approach that was rejected precisely because it used the two-step approach instead of a full multilevel model. The author appealed the decision, citing the special issue of Political Analysis. Two Board members who are methodologists affirmed the rejection on methodological grounds.

    The take home point for me is that the special issue of PA served a very important function in alerting people to some of the pitfalls of doing multilevel analysis without first looking at group estimates. But the issue also created the misconception that the two-step approach is generally preferable to full multilevel analysis—i.e., that it should be the end point of the analysis as well as the starting point. People should need to realize that the rest of the world does not agree with the implicit prescriptive message of the special issue.

  4. "But the issue also created the misconception that the two-step approach is generally preferable to full multilevel analysis … People should need to realize that the rest of the world does not agree with the implicit prescriptive message of the special issue."

    This is a nice point. It is quite fascinating to me how quick fashions develop in political science "methodology" circles. Remember panel corrected standard errors? While Beck and Katz' paper was brilliant, many applications certainly were not…

  5. For those believing "one-stage" estimation of large clusters, just a cautionary note. Stata fixed this bug on January 2007:


    "The random-effects panel-data estimators xtlogit, re; xtprobit, re; xtcloglog, re; xtpoisson, re normal; xtinreg; and xttobit ignore large panels when fitting the model. This was caused by an internal numerical underflow, although no warning message was presented, and this is now fixed."

    Since the special issue is from December 2005, there was no reliable way to estimate those models using Stata at time of publication. And what is the guarantee that other software did(does) any better? The test cases are usually of the large number of groups, small number of observations within groups kind.

    The numerical problems with one-stage estimation should not be ignored. Two-stage estimation does not suffer from the same issues.

    Mark: The "variance equals infinity" point is only relevant for the group specific estimates. In any case, with the large number of observations within groups, there is usually very small pooling.

    -eduardo

Comments are closed.