Skip to content
 

President of American Association of Buggy-Whip Manufacturers takes a strong stand against internal combustion engine, argues that the so-called “automobile” has “little grounding in theory” and that “results can vary widely based on the particular fuel that is used”

PostcardTorringtonCTRacingBuggyCirca1910

Some people pointed me to this official statement signed by Michael Link, president of the American Association for Public Opinion Research (AAPOR). My colleague David Rothschild and I wrote a measured response to Link’s statement which I posted on the sister blog. But then I made the mistake of actually reading what Link wrote, and it really upset me in that it reminded me of various anti-innovation attitudes in statistics I’ve encountered over the past few decades.

If you want to oppose innovation, fine: there are a lot of reasons why it can make sense to go with old methods and to play it slow. Better the devil you know etc. And on the other side there are reasons to go with the new. Open discussion and debate can be helpful in establishing the zones of application where different methods are more useful.

What I really don’t like, though, is when someone takes a position and then just makes things up to support it, as if this is some kind of war of soundbites and it doesn’t matter what you say as long as it sounds good. That’s what Link did in his statement. He just made stuff up. AAPOR is a serious professional organization and this statement was a serious mistake on its part.

After reading Link’s article, I wrote a long sarcastic post blasting it. But then I deleted my post: really, what was the point? Instead, I’ll say things as directly as possible.

In his article, Link criticizes the recent decision of the New York Times to work with polling company YouGov to conduct an opt-in internet survey. Link states that “these methods have little grounding in theory and the results can vary widely based on the particular method used.”

But he’s just talking out his ass. Traditional surveys nowadays can have response rates in the 10% range. There’s no “grounding in theory” that allows you to make statements about those missing 90% of respondents. Or, to put it another way, the “grounding in theory” that allows you to make claims about the nonrespondents in a traditional survey, also allows you to make claims about the people not reached in an internet survey. What you do is you make assumptions and go from there. You gather as much data as possible so that the assumptions you make can be as weak as possible. The general principles here are not new but a lot of research is done on the specifics. Regular readers of this blog will know about Mister P (multilevel regression and poststratification) which is my preferred framework for attacking such problems. The basic ideas of Mister P come from the work of Rod Little in the early 1990s, but it’s a pretty open-ended framework and I and others have been working hard on it for awhile.

Whether your data come from random-digit dialing, address-based sampling, the internet, or plain old knocking on doors, you’ll have to do some adjustment to correct for known differences between sample and population. This was true in 1995 and is even more true today, as response rates regularly go below 10%.

Link talks a bit about transparency, which is a bit of a joke considering all the mysteries involved in conventional polling. How was it, again, that Gallup’s numbers were so far off in 2012?

This kind of aggressive methodological conservatism just makes me want to barf. Just to be clear: methodological conservatism doesn’t bother me. Indeed, I’d completely respect it if Link were to write something like, “Survey researchers have nearly a century of experience with probability sampling using random digit dialing, address-based sampling, and so on. For the purpose of public policy, it makes sense to trust and refine the methods that have worked in the past. New methods such as multilevel regression and poststratification might be great, but really we should be careful. And new methods should be held to the same high standards as we require for classical methods.”

That’s what Link could have said. And I’d have no problem with it. I mean, sure, I’d still disagree, I’d still prefer that this not be released as an official statement of the nation’s leading professional society of pollsters, I’d still point out the problem with 90% nonresponse rates, and I’d still argue that traditional survey methods have big problems. But I’d respect his argument.

What I don’t respect is the B.S. in the official AAPOR statement. There’s the bogus, bogus, bogus bit about “grounding in theory” and then there’s this:

Standards need to be in place at all times precisely to avoid the “we know it when we see it (or worse yet, ‘prefer it’)” approach, which often gives expediency and flash far greater weight than confidence and veracity.

Expediency and flash, huh? How bout you do your job and I do mine. My job when doing survey research is to get an estimate that’s as close to the population value as possible. If you think that’s “expediency and flash,” that’s your problem, not mine.

Finally:

AAPOR is committed to exerting leadership in maintaining data quality regardless of the methodologies being employed. To this end, we strongly encourage all polling outlets to proceed cautiously in the use of new approaches to measuring public opinion, such as those using non-probability sample methods, particularly when these data are used in news or election reporting or public policymaking.

Sorry, but this AAPOR statement is not an exercise in leadership. It’s an exercise in rhetoric, and it makes me want to barf. “Little grounding in theory . . . expediency and flash . . .”: Give me a break.

67 Comments

  1. B D McCullough says:

    Hear! Hear!

  2. Rahul says:

    Are there any wise people outside of the buggy-whip peddlers and the IC-engine peddlers who have written on this topic?

    I’d love to read a third, “unbiased” opinion. Are there any statisticians writing on this topic who don’t have their skin in the game?

    • zbicyclist says:

      “who don’t have skin in the game”? Almost by definition, these would be people who don’t have enough familiarity with the issues involved to have an informed viewpoint.

      • Rahul says:

        I disagree. One must balance expertise against an obvious conflict of interest.

        A smart, reasonable statistician ( who isn’t a pollster ) would be a good guy to hear out on this issue.

  3. numeric says:

    Seems to me the only way to resolve this question is through examination of results over a considerable number of polls. Oops. Now I’ve done it. Closet frequentist.

    • Anonymous says:

      Bayesians are allowed to compute winning percentages just like anyone else.

      A real closet frequentist is someone whose mind explodes when the Bayesian analysis gets a higher winning percentage a Frequentist one.

    • Anonymous says:

      Or better yet, a closet frequentist is someone who imagines that a Bayesian analysis is philosophically required to have a lower winning percentage because they imagine some poll taken on Aug 6 2014 at 5pm EST is a repeatable event and hence a random variable.

    • Anonymous says:

      or even how about:

      “closet frequentists” are those who believe using probabilities to model uncertainties has to be objectively worse than using probabilities to model frequencies on the grounds that the former makes no sense to them, and they are such colossal super geniuses that if it makes no sense to them, it must not make any sense to anyone. Even God.

      • I think closet implies they do one thing and believe another, so a closet frequentist is one who does bayesian analysis of one off events and then makes themselves feel better by imagining an infinity of possible worlds in which the outcomes came out differently according to the appropriate frequencies.

        • Anonymous says:

          Maybe that’s a better formulation.

          A closet frequentist is one who uses Bayes theorem but thinks it only takes on a hard concrete objectiveness whenever they imagine calculating each probability as the percentage of universes in which it holds true.

          ’cause, you know, measuring other universes is a small price to pay to avoid quantifying uncertainty in this one.

        • jrc says:

          I’m a closet frequentist because I didn’t have an office, but did have a walk-in closet big enough for a desk.

    • Andrew says:

      Numeric:

      I’ve long written about the duality between the Bayesian prior distribution and long-run frequencies. Bayesian inference is based on a reference set of equally likely events, which maps on to the prior distribution. Frequentist inference is based on a reference set of replications which are assigned equal weight. We discuss this in chapter 1 of BDA, indeed it was in chapter 1 of the first edition, back in 1995.

  4. Martin says:

    Yes, probability and non-probability methods have to deal with “adjusting for known differences between sample and population”. But probability methods has small advantage : we can compare the gotten sample against the desired sample and the population, in contrast, with a non-probability method we can compare the gotten sample only against the population.

  5. D.O. says:

    Link talks a bit about transparency, which is a bit of a joke considering all the mysteries involved in conventional polling. How was it, again, that Gallup’s numbers were so far off in 2012?

    Wrong (more than usual) likely voter model, no?

  6. zbicyclist says:

    Check out these Pew numbers on response rates for comparable surveys since just 1997:
    http://www.people-press.org/2012/05/15/assessing-the-representativeness-of-public-opinion-surveys/
    1997: 36%
    2012: 9%

    The claim is often that “telephone surveys that include landlines and cell phones and are weighted to match the demographic composition of the population continue to provide accurate data on most political, social and economic measures. This comports with the consistent record of accuracy achieved by major polls when it comes to estimating election outcomes, among other things.” (op.cit.)

    But I’m not sure this is quite as simple as this sounds, or that with all the machinery needed for adequate weighting and projection on top of low response rates that complex sampling designs (like I learned from Leslie Kish in the 1970s) help as much as we’d like to think.

    The problem will get worse. Requests to answer surveys are now much more common than promises of riches from Nigerian oil ministers, and generate about as much interest in the public. Part of it is our fault — how many surveys have you answered that are supposed to be “5 minutes” but consist of several screens with poorly phrased questions?

  7. Isaac Newton says:

    You’re a troll, Andrew Gelman. I thought you were above the ad hominem. I no longer have respect for you.

  8. Steve Sailer says:

    Back in 2012, Reuters-Ipsos ran a huge online panel of likely voters: sample size over 40,000. You could answer questions that couldn’t be answered from the exit poll, like how did white people in Texas vote for President. But I’ve never seen anybody else refer to it.

  9. Gaurav says:

    I thought your article with Rothschild was too ‘measured.’ So I was pleased to read this blog. I am glad that you are calling a spade a spade. AAPOR comes across as a front for probability pollsters.

  10. […] and David Rothschild’s reasonable and measured response (and also Andrew’s later reasonable and less measured response) to a statement from the American Association for Public Opinion Research.  The AAPOR […]

  11. Mia says:

    AAPOR has in fact recently issued a very thorough and what seems to me unbiased report on non-probability sampling, written by a diverse group of researchers coming from both industry and academia:
    http://www.aapor.org/AM/Template.cfm?Section=Reports1&Template=/CM/ContentDisplay.cfm&ContentID=6055

    • The report is good as far as it goes, but see this quote (p6)

      We have explicitly avoided exploring probability sampling methods under less than ideal conditions and comparing of estimates between probability and non-probability samples. We realize there is considerable interest in whether a probability sample is still a probability sample when it has low coverage or high nonresponse, but the task force has not attempted to undertake this controversial and substantial task.

      That is, they have a reasonable treatment of non-probability samples (as you’d expect, given the group who wrote it), but they completely punt on the issue of whether actually-existing probability sampling with non-response works, and if so, why. That’s the whole controversy.

      • Daniel Gotthardt says:

        How would you actually untertake this task, though? Of course we can check how real “probability samples” perform in predictions but how much does that tell us about the accuracy of the inference from the real “probability sample” to the population at the time when the sampling occured? Of course, for polling it’s more interesting how well the method allows you to predict the next election and probably somebody (TM) should try to do that for a bigger sample of elections and polls. Against what should we we compare the inference? Statistical theory probably cannot help us much either as we just cannot really know about the causes of non-random non-response. Of course we can compare some basic demographics or study the survey literature on non-respone but usually we do *not* know in how far the variables of *interest* in a study might be affected …

        I’ve not yet seen such a study but maybe there is one. The report seems to be interesting, though, and I will need to find some time to study it more carefully …

        • Elections would be one place where there’s ‘ground truth’ available. I don’t know what others there are.

          There’s also some limited evidence from CDC’s BRFSS when they introduced a cell-phone sample and improved raking techniques: the combined sample agreed fairly well with the landline sample plus the improved raking, but not with the landline sample and simple post-stratification. That suggests improved modelling helps.

          http://www.michigan.gov/documents/mdch/MIBRFSS_Surveillance_Brief_Sept_2012_Vol6No4_FINAL_398417_7.pdf

          • Daniel Gotthardt says:

            Thomas:

            Thanks! I meant outside of elections. Probability sampling is the “gold standard” in Sociology and while I’m fairly interested in polling and election forecasts, I’m not a Political Scientist, and I do wonder how we could evaluate probability sampling with high non-response and maybe low coverage to non probability sampling there. – It’s also an interesting issue for psychologists as some question the validity of inference from the usual “psychology students” sample and there is some argument about whether to introduce probability sampling in more sub-fields of psychology.

            • jrc says:

              So one place where they track non-response, and then force people to respond, is the CPS. I know this, because one day my neighbor had a note saying something like “We stopped by to interview you, please get in touch with us.” Then a week later, they left this note:

              ***

              Sorry to have missed you. This address has been selected for a very important U.S. Census Bureau study. These studies are a positive use of of your taxpayer dollars. Making unnecessary visits to contact your household is a negative use of taxpayer dollars. I am obligated to return to this address until contact is made. It would be helpful if you would call me with your name and phone number as soon as possible. We can arrange a mutually convenient time to complete a brief interview, either in person or by phone.

              Thank you.

              ***
              I wonder if you could use the initial non-response people to get some idea of the nature of the selection bias. I wonder if they code the “type of non-response” – like, “not at home” or “hangs up when we call” or “was a total jerk and slammed the door in my face.”

              Also, I just wanted to post the letter, because the sentence “Making unnecessary visits to contact your household is a negative use of taxpayer dollars” is just pure gold.

  12. Rahul says:

    How high were the response rates to traditional surveys in the old days? Is there a good trend somewhere about the dropping response rates of telephone surveys that Andrew refers to?

  13. Jan Werner says:

    I do think that the language of this post is a little over the top, although I’d really love to have seen the “long sarcastic post” Andrew Gelman initially wrote and deleted in favor of this one.

    That said, the points he makes are valid.

    As a long time AAPOR member, I was appalled by the report of the Non-Probability Task Force, but not surprised, since I feel that panel was assembled by the AAPOR leadership in such a manner as to obtain sought-after results with minimal dissent.

    The report summary states that “… non-probability researchers use their knowledge, experience and/or previous research to model the relationship between key factors they know about the population (e.g., the age, sex and geographic spread of the registered voters) and the specific topic of the study (e.g., vote intention). They use this model to select and/or adjust their non-probability sample in a way that allows statistical insights, provided of course that the model assumptions hold.”

    That, of course, perfectly describes how post-stratification weighting is used to compensate for non-response in “probability surveys,” which seems to be understood by the authors of the report to mean only live telephone interviews of RDD telephone samples or in-person government surveys.

    As I wrote on the AAPOR members’ listserv a couple of days ago, “[s]adly, AAPOR seems to be turning itself into a lobbying group in support of a certain segment of the polling establishment.”

  14. The sophomoric personal attack and the need to “barf” aside, there are some important points here as the field approaches a methodological tipping point. Unfortunately those points are lost in the desire to be provocative and, apparently, the author suffering from a lack of gastrointestinal control. The more reasoned Monkey Cage post is well worth reading.

  15. Anonymous says:

    Dr. Gelman,

    You wrote your response with Rothschild before you read Link’s full statement? Really? (“My colleague David Rothschild and I wrote a measured response to Link’s statement which I posted on the sister blog. But then I made the mistake of actually reading what Link wrote …”)

    You write: “Link talks a bit about transparency, which is a bit of a joke considering all the mysteries involved in conventional polling. How was it, again, that Gallup’s numbers were so far off in 2012?”

    Gallup commissioned Mike Traugott to do a full analysis and his team presented three detailed papers on the findings at the AAPOR conference. The level of transparency in those reports was very high.

    You write: “Expediency and flash, huh? How bout you do your job and I do mine. My job when doing survey research is to get an estimate that’s as close to the population value as possible. If you think that’s “expediency and flash,” that’s your problem, not mine.”

    I think you’re missing the point here. He is talking about the fact that the NYT for many years had specific poll reporting standards and now does not. Journalists love to run with the numbers and without clear standards will be tempted to go with the numbers that fit their stories even if based on methodologies that don’t yield reliable or valid data. News organizations have reporting standards in many areas, and it’s not unreasonable to expect them to have such standards regarding the reporting of survey research.

  16. Anonymous says:

    What amazes me most is that you know a “Rothschild”. It’s like saying “in the paper I coauthored with Henry Hapsburg and Paul Plantagenet … “

  17. Anonymous says:

    Andrew,

    Are you currently or have you ever been paid by yougov?

    Do you or have you worked on the CBS/NYT/YouGov Tracker?

    I know you believe in transparency and disclosure, which is why I ask.

    Thanks

    • Anon42 says:

      I like the humor, Anonymous asking for transparency and then even saying Thanks!

    • Andrew says:

      Anon:

      As David and I wrote in our Monkey Cage post the other day, “We work with YouGov on multiple projects, so we hardly claim to be disinterested observers.” Yougov is not currently paying me but they’ve hired a former postdoc of mine, David Rothschild and I have written a paper with Doug Rivers of Yougov, Doug and some other people paid me an honorarium to speak at Stanford a couple years ago, and Yougov might well pay me in the future. And I stand by everything I wrote above. I wrote my first paper on multilevel regression and poststratification in 1997 and I wasn’t doing it for the money.

      • Doug Rivers says:

        I’d be happy to pay Andrew to work on YouGov-related projects, since it would improve our polling. So far he hasn’t received a penny from YouGov. (The honorarium for giving a series of talks at Stanford came from Stanford, not me or YouGov.)

        Anyone who thinks paying Andrew would have any impact on what he writes here or elsewhere is seriously lacking in observational skills.

  18. Anonymous says:

    The framework for thinking about these issues needs to shift from “probability vs. non-probability” to one of “design-based estimation vs. model-based estimation”.

    An estimate from a probability-based survey with a 10% response rate (let’s pretend the sampling frames cover 100% of the target population) is, in some sense, 10% design-based and 90% model-based. That is, the survey is producing unbiased, textbook estimates for the 10% of the population that are would-be responders (given the data collection protocol) and relying on models (i.e., weighting adjustments) to represent the 90% of the population that are would-be nonresponders.

    Estimates from a non-probability-based survey are 100% model-based.

    The obvious question then becomes, “How good are the models?” That’s a question that can’t be answered in general, because the accuracy of the model will depend very much on the specific population attribute that’s being estimated. A probability-based survey may have an advantage in that it may get at a more diverse set of respondents on which a model can act, but that is conjecture. Regardless, both sides of this debate should acknowledge their reliance on models, and, ideally, disclose the models being used.

  19. Nick Menzies says:

    As someone naive to political polling, I love the idea they are looking at using panels. Sure, you might have even bigger selection effects, but now you have a detailed history to start backing out those effects.

  20. Peter says:

    The most recent issue of the journal “Political Analysis” may be of interest to readers of this post. In particular, the article

    “Does Survey Mode Still Matter? Findings from a 2010 Multi-Mode Comparison” by

    Stephen Ansolabehere and Brian F. Schaffner

    These author compare mail, RDD and opt-in internet surveys that attempted to gather the same information.

  21. […] statistical point.  Sometimes researchers want to play it safe by using traditional methods (most notoriously, in that recent note by Michael Link, president of the American Association of Public Opinion […]

  22. […] Rothschild (coauthor of the Xbox study, the Mythical Swing Voter paper, and of course the notorious Aapor note) will be speaking Friday 10 Oct in the Economics and Big Data meetup in NYC. His title: “How […]

Leave a Reply