A poll that throws away data???

Mark Blumenthal writes:

What do you think about the “random rejection” method used by PPP that was attacked at some length today by a Republican pollster. Our just published post on the debate includes all the details as I know them. The Storify of Martino’s tweets has some additional data tables linked to toward the end.

Also, more specifically, setting aside Martino’s suggestion of manipulation (which is also quite possible with post-stratification weights), would the PPP method introduce more potential random error than weighting?

From Blumenthal’s blog:

B.J. Martino, a senior vice president at the Republican polling firm The Tarrance Group, went on an 30-minute Twitter rant on Tuesday questioning the unorthodox method used by PPP [Public Policy Polling] to select samples and weight data: “Looking at @ppppolls new VA SW. Wondering how many interviews they discarded to get down to 601 completes? Because @ppppolls discards a LOT of interviews. Of 64,811 conducted for @DailyKos /SEIU in 2012, they discarded almost 23K. Sure, a handful of the @ppppolls discards were not valid interview responses. Most appear valid completes. @ppppolls says discarding interviews is a kind of retroactive quota on race, gender and age. Why just not weight the data? . . .

PPP’s explanation of how they weight data – PPP’s explanation of their method appears on their “About Us” page: “Our first step in weighting is to survey more than enough people. This allows us to then be able to randomly reject individual surveys from demographics that are overrepresented. For example, in our polling more women answer relative to men, and not enough African-Americans answer our surveys. Our random selection eliminates any potential bias from the rejections, plus it functions like a quota, only after the fact. PPP also employs a mathematical weighting scheme that assigns a weight based on each demographic.” . . .

Via email, Martino clarifies: “The random process they use to discard older white female interviews in this case, changes the reported composition of and the opinions of the older white females who remain. The discard process can be (can be, not saying is) manipulated to produce desired results. Even random discards within a selected sub-group can be the result of choices the pollster made. Ultimately, why discard at all when you are already weighting after the fact?”

Response from PPP – In response to an email query, PPP’s Tom Jensen defended his company but not the specific methodology challenged by Martino: “I’m sure there are as many methods for weighting polls as there are polling companies. We’ve been doing things the way we do them for over a decade and it’s served us well. . . .

I admit to being a bit baffled by all of this. If this organization is actually going to the trouble of doing full survey interviews on these people, then they definitely shouldn’t be throwing away the responses. Maybe they’re doing some sort of screening, where they’re only asking a few questions and using these in order to decide whether to go on? That could make sense. The whole thing seems so odd to me that I wonder if I’m missing something here.

18 thoughts on “A poll that throws away data???

  1. Sounds somewhat like Yougov.com They are just matching to some target population. Almost all matching routines discard data.

    The question is why not do stratified sampling to begin with so no need to over sample. Maybe something to do with sampling frame.

      • Take matching: use nearest neighbor discard all others, or use all weighted by some function of distance.

        No need in principle to discard data, but also no need to include all of the data I imagine.

        Here different bc presumably not matching units but a joint distribution of stratifying variables

        • Anon:

          If you discard data you throw away information. If the goal is to learn about the whole population, I see no reason to throw away respondents of type X just because there are too many of them. Better to use all the respondents and get a more accurate inference.

          Matching for causal inference is a bit different because, there, it can make (practical) sense to throw away data that are far from the region of overlap between treatment and control groups, as discussed for example in chapter 10 of my book with Jennifer.

        • You are right. The key is overlap. In matching overlap is at the individual level but here you only need overlap at the level of the strata. If it holds, then just weight the data.

  2. IMHO B.J. Martino is accusing PPP of fraud. Whether said fraud is legally actionable is something I can’t answer, but as an independent voter I learned long ago that PPP’s “results” were of questionable validity and strongly biased against Republicans. Honestly, it surprises me that it’s taken this long for someone to challenge them.

    • How can such fraud be legally actionable, even if true? Martino’s not even a client, right? Or is a polling firm bound by some additional laws.

  3. What bothers me most is the quote on their about us page, “Our random selection eliminates any potential bias from the rejections.” If they really believe this, or if users of their poll really believe this, then they’re being sorely misled. Data not missing at random are still not missing at random even if other data are missing completely at random… Any bias remains. This is similar to inflating the sample size of an RCT to account for loss…. Maintains power, but does nothing to fix bias resulting from loss.

  4. I was initially going to defend PPP. They could be doing adaptive surveying using a screening questionnaire to select an optimal number of subjects from each group, and then using survey weights to do population inference. A scheme like this could be a very clever way to minimize cost and maximize power. But then they said this in their response:

    “…our new VA poll is very similar in its findings to the last ones from Rasmussen and Quinnipiac. Beyond that I don’t really feel the need to defend or explain our methods…”

    shady.

  5. Well, the proof is in the pudding, right?

    Here’s how PPP did in the 2012 election (see link to 538, below). Interesting to note a few things: (1) they were below the middle of the pack, but not the worst, (2) their result was off in the direction of leaning Republican, in spite of what some commentators here expect, (3) their result was better than Rasmussen’s (the Republican polling group)
    http://fivethirtyeight.blogs.nytimes.com/2012/11/10/which-polls-fared-best-and-worst-in-the-2012-presidential-race/

    • The proof is not in the pudding. Fraudulent pollsters will often generate results out of thin air that are consistent with the other polling data available (See daily kos’s last pollster Research 2000). Thus is is actually possible for a pollster to be more accurate by not doing a survey, but just aggregating others’ results and pretending. Not that I think that is what is going on here.

      Regarding (2)… Commenters here are a bit more sophisticated than that. The question here is not whether PPP is a partisan hack organization, but whether their stated methodology is based on sample surveying “best practices.”

  6. What am I missing here? This seems like a common technique, just applied in an uncommon place where it doesn’t fit.

    Weighted modeling can be a pain (e.g. weighted regression), particularly in the exploratory stages and if you are having relatively junior personnel work on the problem — you want them concentrating on the sense of the model and the validations, not the mechanics of weighting. So, you construct a self-weighting subsample. (of course, you aren’t going to use all the data anyway, since some data is used for holdout validation). This attempt to make the data easier to deal with is basically justified by the KISS principle.

    Of course, there’s no need to do this in the context of stratified survey estimation (and that’s why I say this doesn’t really fit the PPP usage), because the weighting issues in that realm were a solved problem even when I was in grad school. And, of course, the variance penalty for discarding random respondents to avoid weighting can also be calculated. The penalty might not be that bad in PPP’s case, considering the results Citywalker shows in the comment above.

    • I noticed this last year when I was estimating empirical design effects and saw that PPP’s polls had a design effect of exactly 1. I thought it was a mistake in my model, but digging into it, their Polly’s reported MOEs are accurate because they don’t weight.

      It’s obviously a crazy methodology, but it honestly it isn’t *that* bad. ( p(1-p)/(n/1.5) isn’t that much lower than (p(1-p)/(n/1.3), which is what pollsters tend to get with their design effects anyway. You could avoid that with pre-stratification, but pollsters really seem to suck at that for some reason.

      As for accuracy: PPP at least works off of RV lists, unlike other groups. Honestly, all major public pollsters have utterly terrible methodology, but they, like PPP, tune it until they get reasonable results. Which is fine. I don’t see the issue here.

Comments are closed.