Polling in the 21st century: There ain’t no urn

David Rothschild writes:

The Washington Post (WaPo) utilized Survey Monkey (SM) to survey 74,886 registered voters in all 50 states on who they would vote for in the upcoming election. I am very excited about the work, because I am a huge proponent of advancing polling methodology, but the methodological explanation and data detail bring up some serious questions.

The WaPo explanation conflates method and mode: the mode was online (versus telephone) and the method was a random sample of SM users with raked demographics (versus probability-based sampling with raked demographics). So, the first key difference is that it is online versus telephone. The second key difference is that the sample is random to users of SM versus any telephone users. The third possible difference is not employed: despite different modes and selection criteria, the WaPo/SM team used traditional analytics. This poll is more like traditional polls than the WaPo admits, in both its strengths and weaknesses.

Both online and telephone have limitations in who can be reached; but, they have very similar coverage. As of September 2015 89% of the US adults were online in some context. Between cell phones and landlines, most people are on telephones as well. About 90% have cellphones and actually about half are cell phone only. But, that is actually the problem, a confusing number of people have both cell phones and landlines, and many of the cell phones owners no longer live near the area code of their phone. So, while telephones may be able to reach slightly more American adults, that advantage is rapidly diminishing, and US adults without any internet access, are very unlikely voters.

The bigger limitation is that the survey only reaches SM users, rather than any possible online users. I do not know the limitations of SM’s reach, but as the WaPo article notes that they are drawing from about three million daily active users. Over the course of 24-day study, a non-trivial cross section US adults may have interacted with SM, but while I have no way of knowing how it is biased versus the general population of online users, I assume it is relatively good at covering a cross-section of genders, ages, races, incomes, education levels, and geography.

So, while the WaPo is right in saying this sample is non-probability, in that we do not know the probability of any voter answering the survey, so is the traditional phone method. We do not know the probability of non-telephone users being excluded from being called, especially with shifting cell-phone and landline coverage. On a personal note, I do not get called by traditional polls because my cell phone area code is at where my parents lived when I got my first cell phone 14 years ago. And, we do not know all of the dimensions which drive the nonresponse of people called (somewhere between 1% and 10% of people answer the phone). In short, both methods are non-probability.

What is disappointing to me is that the WaPo/SM team then employed an analytical method that is optimized for probability-based telephone surveys: raking. Raking means matching the marginal demographics of the respondents to the Census on: age, race, sex, education, and region. With 74,886 respondents, and a goal to provide state-level results, the team should use modelling and post-stratification. MRP employees all of respondents to create an estimate for any sub-group. It draws on the idea that white men from Kansas can help estimate how white men from Arkansas may vote or white people in general from New York. It is a powerful tool for non-probability surveys (regardless of the mode or method).

The team did break from tradition and weighed on party identification in: Colorado, Florida, Georgia, Ohio, and Texas. Partisan nonresponse is a big problem and party identification should be used to both stabilize the ups and downs of any given poll and create a more accurate forecast of actual voting. But, it should never be employed selectively within a single survey!

Finally, the WaPo/SM team notes that “The Post-SurveyMonkey poll employed a “non-probability” sample of respondents. While standard Washington Post surveys draw random samples of cellular and landline users to ensure every voter has a chance of being selected, the probability of any given voter being invited to a SurveyMonkey is unknown, and those who do not use the platform do not have a chance of being selected. A margin of sampling error is not calculated for SurveyMonkey results, since this is a statistical property only applicable to randomly sampled surveys.” As noted above the true probability is never known for both the new SM survey and any of the WaPo’s traditional polls. Empirically, the true margin of error is about twice as large as the stated margin of error, for traditional polls, because they ignore: coverage, nonresponse, measurement, and specification error.

I just want to add three things.

1. I appreciate the details given by Washington Post polling manager Scott Clement in his news article, “How The Washington Post-SurveyMonkey 50-state poll was conducted,” and I know that David Rothschild appreciates it too. Transparent discussion is great. David and I disagree with Clement on some things, and the way we can all move forward is to address this, which is facilitated by Clement’s step of posting that article.

2. Clement talks about “why The Post chose to use a non-probability sample . . .” But with nonresponse rates at 90%, every poll is a non-probability sample. There ain’t no urn.

3. Clement also writes, “A margin of sampling error is not calculated for SurveyMonkey results, since this is a statistical property only applicable to randomly sampled surveys.” I disagree. Again, no modern survey is even close to a random sample. Sampling error calculations are always based on assumptions which are actually false. That doesn’t mean it’s a bad idea to do such a calculation. Better still would be to give a margin of error that includes non-sampling error, for reasons discussed in the last paragraph of David’s note above. But, again, this is the case for random-digit-dial telephone surveys as well.

And one more thing: I agree with David that (a) they should be adjusting for party ID in all the states, not just 4 of them, and (b) if they want separate state estimates, they should use MRP.

Overall it looks like the analysis plan wasn’t fully thought through. Too bad they didn’t ask us for advice!

9 thoughts on “Polling in the 21st century: There ain’t no urn

  1. I think David’s description of the SurveyMonkey sample is misstated slightly.

    The description of the sample can be found here: http://apps.washingtonpost.com/g/page/politics/washington-post-surveymonkey-50-state-poll/2086/ says, “The sample was drawn among the respondents who completed separate user-generated polls using SurveyMonkey’s platform during this period.”

    SurveyMonkey is one of the largest survey platforms in the world. It is probably the largest self-service platform. You do not have to be a user to complete a survey created on SurveyMonkey. In fact, if you are someone who completes surveys it is very likely you have completed quite a few surveys created on SurveyMonkey’s platform. They do not always have a SurveyMonkey brand attached.

    The WAPO description also says, “With non-probability samples, testing is necessary to ensure a particular sampling strategy, along with adjustments to match population demographics, can consistently produce accurate estimates.”

    This sounds perfectly appropriate.

    • Curious:

      Interesting background on Survey Monkey. Regarding your last two paragraphs: yes, the statement about testing and adjustment is perfectly appropriate. It’s also perfectly appropriate for conventional telephone surveys because they are not probability samples either, not with 90% nonresponse rates.

      It’s great to see practitioners getting serious about assumptions and adjustments. Let’s just get serious about that, period, and not kid ourselves that there’s something out there called “probability sampling” that doesn’t require this sort of care.

      • Agreed! We are all in agreement here. But, as Andrew said, my biggest concern is the Washington Post’s description of their “traditional” polling. They should take as much care to make sure that data is analysed well. Second, why did Survey Monkey/Washington Post use the same analytical technique used by “traditional” polling companies for 60+ years? New challenges and new computation begs new techniques like, Andrew and my MRP.

  2. Out of curiosity: is there a reference for “many of the cell phones owners no longer live near the area code of their phone”? I’ve sort of assumed that’s a common phenomenon, but then again I know lots of people who got their first phone back when we were in college and then transferred the number around with them. I could also see plenty of people just getting their phone plan where they live.

  3. Andrew – sometimes there is an urn, and it turns out it is funnier when reality mimics probability than (the usual case of) when probability is used to mimic reality.

    “There, in the middle of the North Pacific, in seas almost four miles deep… 28,800 plastic animals produced in Chinese factories for the bathtubs of America—7,200 red beavers, 7,200 green frogs, 7,200 blue turtles, and 7,200 yellow ducks—hatched from their plastic shells and drifted free.”

    http://harpers.org/archive/2007/01/moby-duck/?single=1

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *