## Survey weighting and regression modeling

Yphtach Lelkes points us to a recent article on survey weighting by three economists, Gary Solon, Steven Haider, and Jeffrey Wooldridge, who write:

We start by distinguishing two purposes of estimation: to estimate population descriptive statistics and to estimate causal effects. In the former type of research, weighting is called for when it is needed to make the analysis sample representative of the target population. In the latter type, the weighting issue is more nuanced. We discuss three distinct potential motives for weighting when estimating causal effects: (1) to achieve precise estimates by correcting for heteroskedasticity, (2) to achieve consistent estimates by correcting for endogenous sampling, and (3) to identify average partial effects in the presence of unmodeled heterogeneity of effects.

These is indeed an important and difficult topic and I’m glad to see economists becoming aware of it. I do not quite agree with their focus—in practice, heteroskedasticity never seems like much of a bit deal to me, nor do I care much about so-called consistency of estimates—but there are many ways to Rome, and the first step is to move beyond a naive view of weighting as some sort of magic solution.

Solon et al. pretty much only refer to literature within the field of economics, which is too bad because they miss this twenty-year-old paper by Chris Winship and Larry Radbill, “Sampling Weights and Regression Analysis,” from Sociological Methods and Research, which begins:

Most major population surveys used by social scientists are based on complex sampling designs where sampling units have different probabilities of being selected. Although sampling weights must generally be used to derive unbiased estimates of univariate population characteristics, the decision about their use in regression analysis is more complicated. Where sampling weights are solely a function of independent variables included in the model, unweighted OLS estimates are preferred because they are unbiased, consistent, and have smaller standard errors than weighted OLS estimates. Where sampling weights are a function of the dependent variable (and thus of the error term), we recommend first attempting to respecify the model so that they are solely a function of the independent variables. If this can be accomplished, then unweighted OLS is again preferred. . . .

This topic also has close connections with multilevel regression and poststratification, as discussed in my 2007 article, “Struggles with survey weighting and regression modeling,” which is (somewhat) famous for its opening:

Survey weighting is a mess. It is not always clear how to use weights in estimating anything more complicated than a simple mean or ratios, and standard errors are tricky even with simple weighted means.

I was unaware of Winship and Radbill’s work when writing my paper, so I accept blame for insularity as well.

In any case, it’s good to see broader interest in this important unsolved problem.

1. Elrod says:

I wish I knew enough statistics to provide a topical & constructive comment (I’m learning!), but….
Typos: ThESE is indeed…never seems like much of a biT deal to me

2. Fernando says:

On insularity: I find that too in political science.

In the age of Google it feels like insularity is a feature not a bug.

3. Simon says:

I’m not sure what its like in other fields, but in the physical sciences it is pretty much impossible to keep up with current literature in your own speciality let alone the related topics. Its understandable that economists did not pick up on a sociology paper. Yes, Google Scholar, Web of Knowledge etc. are pretty amazing tools (I was raised on paper journals and chasing leads through volumes of abstracts for particular years…) but there is still a large element of luck to finding work.

• Daniel Gotthardt says:

We’re speaking about methods here, though. Insularity between the research in methods for different fields of social research is one of the most annoying things in the Social Sciences I’ve encountered so far. Research is just one area here, though, it’s also true for teaching and cooperation between Academics in general. At least in Germany we often can’t have advanced methods courses for too few students e.g. in Sociology. But there are also students from psychology, criminology, political sciences who would be interested in advanced methods courses who can’t get them because there are too few interested …

4. Seth says:

Why don’t you care about consistency?

• Manoel Galdino says:

I shared this post on facebook, and someone asked me this same question about the consistency comment by Gelman. Let’s se what he will answer, I’m curiuous as well.

• Andrew says:

Seth, Manoel:

As the saying goes, asymptotically we are all dead.

• Corey says:

I put a lot more weight on consistency than this. My basic reasoning is: if a method can’t get the right answer with unbounded information, why should I trust the answers it generates with finite information?