In political science, there is an increasing availability of cross country survey data, such as the Comparative Study of Electoral Systems (CSES, 33 countries) and the World Values Study (WVS, more than 70 countries). What is the best way to analyze data with this structure, specially when one suspects a great deal of heterogeneity across countries?

The structure of cross-section survey data has a small number of countries relative to the number of observations in each country. This, of course, is the exact opposite of panel data. Methods such as random effects logit or probit work well under the assumption that the number of countries goes to infinity and the number of observations in each country is small. In fact, the computational strategies (Gauss-Hermite quadrature and variants) are only guaranteed to work when the number of observations per country is small. Another useful technique, robust standard errors clustered by country, is also known to provide overconfident standard errors when the number of clusters (in our case, countries) is small. Bayesian Multilevel models would work, but are we really worried about efficiency when we have more than 1000 observations per country?

Different people in the discipline have been suggesting a two step strategy. The first step involves estimating separate models for each country, obviously including only variables which vary within countries. Then one estimates a model for each coefficient as a function of contextual level variables (that are the main interest). Since the number of observations in each country is large, under standard assumptions the individual level estimates are consistent and asymptotically normal. We can take each of the individual level estimates to be a reduced form parameter of a fully interactive model.

The country level model might be estimated via ordinary least squares, or one of the various weighting schemes proposed in the meta-analysis literature (in addition to, of course, Bayesian Meta-Analysis). What are the potential problems and advantages of such approach? Here are the ones I can think of:

Advantages :

1) We don’t need to give a distribution for the individual level estimates. That is, one need not to assume that the “random effects” have, for example, a normal distribution. The coefficients are simply estimated from the data.

2) Computational Speed when compared to full MCMC methods.

3) Some monte carlo evidence showing that the standard errors are closer to the nominal levels than alternative strategies.

Disadvantages:

1) When fitting discrete choice models (e.g. probit, logit) we need to worry about the scale invariance (we estimate beta/sigma in each country, but we do not constraint sigma to be the same across countries). Any ideas on how to solve this problem?

2) Efficiency losses (which I think are minimal)

Further issues:

1) Does it have any advantages over an interactive regression model with, say, clustered standard errors? Or GEE? Relatedly, do we interpret the effects as regular conditional (i.e. random effects) model?

2) Is it worrisome to fit a maximum likelihood in the first step and a bayesian model at the second?

We (John Huber, Georgia Kernell and Eduardo Leoni) took this approach in this paper, if you want to see an application. It is still a very rough draft, comments more than welcome.

Andrew commented:

To start with, don't worry about computational issues. Fit a model that makes sense, then simplify if need be.

It makes sense to start by fitting a separate model to the data from each country, then modeling the coefficients in the second stage. As you note, this is an approximation to hierarchical modeling which will be reasonable if you have precise estimates within each country.