David Rothschild writes:
The Washington Post (WaPo) utilized Survey Monkey (SM) to survey 74,886 registered voters in all 50 states on who they would vote for in the upcoming election. I am very excited about the work, because I am a huge proponent of advancing polling methodology, but the methodological explanation and data detail bring up some serious questions.
The WaPo explanation conflates method and mode: the mode was online (versus telephone) and the method was a random sample of SM users with raked demographics (versus probability-based sampling with raked demographics). So, the first key difference is that it is online versus telephone. The second key difference is that the sample is random to users of SM versus any telephone users. The third possible difference is not employed: despite different modes and selection criteria, the WaPo/SM team used traditional analytics. This poll is more like traditional polls than the WaPo admits, in both its strengths and weaknesses.
Both online and telephone have limitations in who can be reached; but, they have very similar coverage. As of September 2015 89% of the US adults were online in some context. Between cell phones and landlines, most people are on telephones as well. About 90% have cellphones and actually about half are cell phone only. But, that is actually the problem, a confusing number of people have both cell phones and landlines, and many of the cell phones owners no longer live near the area code of their phone. So, while telephones may be able to reach slightly more American adults, that advantage is rapidly diminishing, and US adults without any internet access, are very unlikely voters.
The bigger limitation is that the survey only reaches SM users, rather than any possible online users. I do not know the limitations of SM’s reach, but as the WaPo article notes that they are drawing from about three million daily active users. Over the course of 24-day study, a non-trivial cross section US adults may have interacted with SM, but while I have no way of knowing how it is biased versus the general population of online users, I assume it is relatively good at covering a cross-section of genders, ages, races, incomes, education levels, and geography.
So, while the WaPo is right in saying this sample is non-probability, in that we do not know the probability of any voter answering the survey, so is the traditional phone method. We do not know the probability of non-telephone users being excluded from being called, especially with shifting cell-phone and landline coverage. On a personal note, I do not get called by traditional polls because my cell phone area code is at where my parents lived when I got my first cell phone 14 years ago. And, we do not know all of the dimensions which drive the nonresponse of people called (somewhere between 1% and 10% of people answer the phone). In short, both methods are non-probability.
What is disappointing to me is that the WaPo/SM team then employed an analytical method that is optimized for probability-based telephone surveys: raking. Raking means matching the marginal demographics of the respondents to the Census on: age, race, sex, education, and region. With 74,886 respondents, and a goal to provide state-level results, the team should use modelling and post-stratification. MRP employees all of respondents to create an estimate for any sub-group. It draws on the idea that white men from Kansas can help estimate how white men from Arkansas may vote or white people in general from New York. It is a powerful tool for non-probability surveys (regardless of the mode or method).
The team did break from tradition and weighed on party identification in: Colorado, Florida, Georgia, Ohio, and Texas. Partisan nonresponse is a big problem and party identification should be used to both stabilize the ups and downs of any given poll and create a more accurate forecast of actual voting. But, it should never be employed selectively within a single survey!
Finally, the WaPo/SM team notes that “The Post-SurveyMonkey poll employed a “non-probability” sample of respondents. While standard Washington Post surveys draw random samples of cellular and landline users to ensure every voter has a chance of being selected, the probability of any given voter being invited to a SurveyMonkey is unknown, and those who do not use the platform do not have a chance of being selected. A margin of sampling error is not calculated for SurveyMonkey results, since this is a statistical property only applicable to randomly sampled surveys.” As noted above the true probability is never known for both the new SM survey and any of the WaPo’s traditional polls. Empirically, the true margin of error is about twice as large as the stated margin of error, for traditional polls, because they ignore: coverage, nonresponse, measurement, and specification error.
I just want to add three things.
1. I appreciate the details given by Washington Post polling manager Scott Clement in his news article, “How The Washington Post-SurveyMonkey 50-state poll was conducted,” and I know that David Rothschild appreciates it too. Transparent discussion is great. David and I disagree with Clement on some things, and the way we can all move forward is to address this, which is facilitated by Clement’s step of posting that article.
2. Clement talks about “why The Post chose to use a non-probability sample . . .” But with nonresponse rates at 90%, every poll is a non-probability sample. There ain’t no urn.
3. Clement also writes, “A margin of sampling error is not calculated for SurveyMonkey results, since this is a statistical property only applicable to randomly sampled surveys.” I disagree. Again, no modern survey is even close to a random sample. Sampling error calculations are always based on assumptions which are actually false. That doesn’t mean it’s a bad idea to do such a calculation. Better still would be to give a margin of error that includes non-sampling error, for reasons discussed in the last paragraph of David’s note above. But, again, this is the case for random-digit-dial telephone surveys as well.
And one more thing: I agree with David that (a) they should be adjusting for party ID in all the states, not just 4 of them, and (b) if they want separate state estimates, they should use MRP.
Overall it looks like the analysis plan wasn’t fully thought through. Too bad they didn’t ask us for advice!