This problem comes up a lot: We have multiple surveys of the same population and we want a single inference. The usual approach, applied carefully by news organizations such as Real Clear Politics and Five Thirty Eight, and applied sloppily by various attention-seeking pundits every two or four years, is “poll aggregation”: you take the estimate from each poll separately, if necessary correct these estimates for bias, then combine them with some sort of weighted average.
But this procedure is inefficient and can lead to overconfidence (see discussion here, or just remember the 2016 election).
A better approach is to pool all the data from all the surveys together. A survey response is a survey response! Then when you fit your model, include indicators for the individual surveys (varying intercepts, maybe varying slopes too), and include that uncertainty in your inferences. Best of both worlds: you get the efficiency from counting each survey response equally, and you get an appropriate accounting of uncertainty from the multiple surveys.
OK, you can’t always do this: To do it, you need all the raw data from the surveys. But it’s what you should be doing, and if you can’t, you should recognize what you’re missing.
Aggregating individual polls data might count overlapped individual respondents more than once, and hence leads to another level of overconfidence? Also, it is debatable how possible it is for different polls to have exactly the same population. That shift might induce further uncertainty, as intuitively the more independent each individual poll is, the more variance shrinkage the final aggregation could gain.
Yuling:
If anyone responds to both surveys, I’d include their response just once; I would not include duplicate responses. But above I’m thinking of surveys of large populations, in which duplicates would be so rare as to be essentially irrelevant in practice.
For duplications, including all of them leads to overconfidence as it is no longer independent Bernoulli, but deleting them without further adjustment will lead to bias. Sure, duplication is rare in large-scale polls, but I think it represents an extreme case of how data concentration/dependence will influence the aggregation.
Any good (emphasis) example of this comes to mind?
What if sampling procedures and weights are different?
Ideally you create a model for the data collection process that links the underlying population to the data collected through the sampling method, and then use different models for the different surveys, with a single underlying population you do inference on.
Different polls and/or surveys are worded differently, have different sampling strategies, and were conducted at different times. Aggregation done properly can adjust for quality, recency, and the historical bias of the pollster, which can come from language, selection, or other rules used for polling (how many times to retry a number, etc.)
Pooling is great when these are not present, and probably beats naive or badly done aggregation, but do you think that pooling would enable better predictive accuracy for a site like Five Thirty Eight than they manage now with aggregation?
Not a polling expert at all, but I would say yes. All of those factors will be accounted for by including indicators as Andrew suggested, at least as well as they are now with aggregation methods. You have additional external information about each poll? Fine, include it in the group-level model for the indicators.
Doesn’t this approach assume the availability of individual response level data for each poll? What if the only thing available to you is tabular information, e.g., breakouts by geo-demo, or mixtures of response and tabular information?
David:
See the last paragraph of my post.
Thanks for pointing that out but have methods been developed for fusing tabular data from different surveys?
Remember reading this back in 2012 that deals with just having summaries but little of the details Modelling bias in combining small area prevalence estimates from multiple surveys Giancarlo Manzi David J. Spiegelhalter Rebecca M. Turner Julian Flowers Simon G. Thompson.
The theory is all about getting some common things while properly allowing for differences jointly in the surveys. There are less challenges with the raw data but the challenges can be overwhelming.
Theory is not that hard (at least using likelihood and priors) and in principle there should be an aggregation (weighted combination of summaries) that is a close approximation (though you would need the raw data to actually do that.) My 10 year old this did that https://phaneron0.files.wordpress.com/2015/08/thesisreprint.pdf .
KOR-Very helpful, thanks! DJ
Is the last paragraph of your post simply an 8 schools meta-analysis?