Yphtach Lelkes points us to a recent article on survey weighting by three economists, Gary Solon, Steven Haider, and Jeffrey Wooldridge, who write:
We start by distinguishing two purposes of estimation: to estimate population descriptive statistics and to estimate causal effects. In the former type of research, weighting is called for when it is needed to make the analysis sample representative of the target population. In the latter type, the weighting issue is more nuanced. We discuss three distinct potential motives for weighting when estimating causal effects: (1) to achieve precise estimates by correcting for heteroskedasticity, (2) to achieve consistent estimates by correcting for endogenous sampling, and (3) to identify average partial effects in the presence of unmodeled heterogeneity of effects.
These is indeed an important and difficult topic and I’m glad to see economists becoming aware of it. I do not quite agree with their focus—in practice, heteroskedasticity never seems like much of a bit deal to me, nor do I care much about so-called consistency of estimates—but there are many ways to Rome, and the first step is to move beyond a naive view of weighting as some sort of magic solution.
Solon et al. pretty much only refer to literature within the field of economics, which is too bad because they miss this twenty-year-old paper by Chris Winship and Larry Radbill, “Sampling Weights and Regression Analysis,” from Sociological Methods and Research, which begins:
Most major population surveys used by social scientists are based on complex sampling designs where sampling units have different probabilities of being selected. Although sampling weights must generally be used to derive unbiased estimates of univariate population characteristics, the decision about their use in regression analysis is more complicated. Where sampling weights are solely a function of independent variables included in the model, unweighted OLS estimates are preferred because they are unbiased, consistent, and have smaller standard errors than weighted OLS estimates. Where sampling weights are a function of the dependent variable (and thus of the error term), we recommend first attempting to respecify the model so that they are solely a function of the independent variables. If this can be accomplished, then unweighted OLS is again preferred. . . .
This topic also has close connections with multilevel regression and poststratification, as discussed in my 2007 article, “Struggles with survey weighting and regression modeling,” which is (somewhat) famous for its opening:
Survey weighting is a mess. It is not always clear how to use weights in estimating anything more complicated than a simple mean or ratios, and standard errors are tricky even with simple weighted means.
See also our response to the discusssions.
I was unaware of Winship and Radbill’s work when writing my paper, so I accept blame for insularity as well.
In any case, it’s good to see broader interest in this important unsolved problem.