Weighted survey samples of low-incidence voters

I was looking over the Polmeth working papers recently and noticed an article by Michael Alvarez and Jonathan Nagler. Here’s the abstract:

In this paper we [Alvarez and Nagler] describe a method for weighting surveys of a sub-sample of voters. We focus on the case of Latino voters. And we analyze data for three surveys: two opinion polls leading up to the 2004 presidential election, and the national exit poll from the 2004 election. We take advantage of much data when it is available, the large amount of data describing the demographics of Hispanic citizens. And we combine this with a model of turnout of those citizens to improve our estimate of the demographics characteristics of Hispanic voters. We show that alternate weighting schemes can substantively alter inferences about population parameters.

Here’s the paper. Their idea is to weight survey respondents who are Hispanic (or other subpopulations) based on their demographic breakdown in the population, and on their propensity to vote. It’s an interesting paper on an important problem, and I just have a few comments (as usual, focused on my own work, since that is what I’m most familiar with–but I hope these comments will be helpful to others too).

1. Alvarez and Nagler comment that “The development of sample weights is typically not given much discussion in the reporting of survey results.” This is unfortunately true–it can take some effort sometimes to figure out where survey weights come from. Our 1995 paper in Public Opinion Quarterly has details on the sampling and weighting procedures used by nine survey organizations for pre-election polls in 1988 and 1992, so this could be a place to start.

(Footnote 6 of the Alvarez and Nagler paper discusses the National Election Study; our paper discusses commercial polls such as Gallup, CBS, ABC, etc.)

2. In Section 3 they discuss unequal probabilities of sampling. These are important but perhaps even more important is unequal probabilities of response. When aligning sample to population, survey weights combine both these factors.

4. Ratio weighing (such as discussed in this paper) is closely connected to poststratification. In this paper published last year in Political Analysis, David Park, Joe Bafumi, and I discuss the use of poststratification, combined with multilevel regression modeling, to estimate opinion in subsets of the population, using Census numbers to reweight survey estimates. Section 3.2 of that paper describes adjustments for turnout. (See also this paper, to appear in the volume, Public Opinion in State Politics, for more examples of this method.)

The modeling-and-poststratification approach automatically gives standard errors, which is a concern of Alvarez and Nagler. If you want classical standard errors from weighted survey estimates, this paper from the Journal of Official Statistics in 2003 might be helpful.

5. I hope that in the final version of the paper, the results will be presented graphically:

Table 1: Lots of significant figures here but it’s not so easy to compare these coefficients amid all the digits. A graph would be helpful.

Table 2: Graphs with x-axes representing the ordered age categories and the ordered education categories. Maybe also show age x education (some surveys weight by this interaction; see our 1995 paper).

Tables 3 and 4 should be presented as a single graph so the reader doesn’t have to keep flipping back and forth between the 2 tables.

Anyway, this is interesting stuff, and it’s good to see this kind of work that takes survey methodology seriously.