When am I a conservative and when am I a liberal (when it comes to statistics, that is)?

Here I am one day:

Let me conclude with a statistical point. Sometimes researchers want to play it safe by using traditional methods — most notoriously, in that recent note by Michael Link, president of the American Association of Public Opinion Research, arguing against non-probability sampling on the (unsupported) grounds that such methods have “little grounding in theory.” But in the real world of statistics, there’s no such thing as a completely safe method. Adjusting for party ID might seem like a bold and risky move, but, based on the above research, it could well be riskier to not adjust.

I’ve written a lot about the benefits of overcoming the scruples of traditionalists and using (relatively) new methods, specifically Bayesian multilevel models, to solve problems (such as estimation of public opinion in small subgroups of the population) that would otherwise be either impossible or would be done on a completely sloppy, ad hoc basis.

On the other hand, sometimes I’m a conservative curmudgeon, for example in my insistence that claims about beauty and sex ratios, or menstrual cycles and voting, are bogus (or, to be precise, that those claims are pure theoretical speculation, unsupported by the data that purport to back them up).

What’s the deal? How to resolve this? One way to get a handle on this is in each case to think about the alternative. The balance depends on the information available in the problem at hand. In a sense, I’m always a curmudgeonly conservative (as in that delightful image of Grampa Simpson above) in that I’m happy to use prior information and I don’t think I should defer to whatever piece of data happens to be in front of me.

This is the point that Aleks Jakulin and I made in our article, “Bayes: radical, liberal, or conservative?”

Consider the polling scene, where I’m a liberal or radical in wanting to use non-probability sampling (gasp!). But, really, this stance of mine has two parts:

1 (conservative): I don’t particularly trust raw results from probability sampling, as the nonresponse rate is so high and so much adjustment needs to be done to such surveys anyway.

2 (liberal): I think with careful modeling we can do a lot more than just estimate toplines and a few crosstabs.

Now consider those junk psychology studies that get published in tabloid journals based on some flashy p-values. Again, I have two stances:

1 (conservative): Just cos someone sees a pattern in 100 online survey responses, I don’t see this as strong evidence for a pattern in the general population, let alone as evidence for a general claim about human nature or biology or whatever.

2 (liberal): I’m open to the possibility that there are interesting patterns to be discovered, and I recommend careful measurement and within-subject designs to estimate these things accurately.

10 thoughts on “When am I a conservative and when am I a liberal (when it comes to statistics, that is)?

  1. Although the Grandpa Simpson image is delightful, and the Liberal/Conservative characterization is topical for a person in a political science department, I’d characterize the axis of dichotomy as more “realist” vs “idealist”. Your stance in general seems to be “throw in information from whatever valid sources you can get them, combine them with knowledge of basic processes to choose likelihoods, estimate the model realistically, focus on effect sizes conditional on your model, keep alternative models in mind, and don’t be fooled by variation”. I’m going to call that “realist” in that it all takes into account the difficulty of learning from real world data.

    The opposing dichotomy is some kind of idealism: “We can guess the future frequency histogram exactly, using that we can estimate a model with exact tests. seeing the data before hand has no effect on which tests we would do, and so we can get a clean p value that means what it says, and make a decision about the apparent dichotomy of truth or falsity of a hypothesis based on that p value, in the future the number of incorrect decisions across all examples will be < 5% of the total number of decisions (for choice of p< 0.05)".

  2. Another way to think about this might be elite v. mass. Maybe it would be best for the masses of low-level statistics users (e.g., mosquito abatement district officials, etc.) to stick to old-fashioned methods with cookbook recipes. In contrast, elite users of statistics (e.g., professors of statistics, but also most academics in general) should be prodded to use Professor Gelman’s multiple approaches. Over time, elites could develop better cookbook recipes for the masses.

    • Steve:

      There’s some truth to this, but for many purposes the tools used by the “masses” cause them trouble, as in all the “Psychological Science”-type studies that make no sense at all but are supported based on being statistically significant.

      It depends on the application. In some settings a non-sophisticated user can do fine using brute-force methods by simply increasing the sample size. In these sorts of problems, more sophisticated methods increase statistical efficiency, which implies a cost tradeoff for the non-sophisticated user: pay a bunch to hire an expert to do a more efficient analysis, or pay for more data collection so as to get an equivalently-good inference using simpler methods. In other settings, a non-sophisticated user can get misled by simple methods, and in such settings I think it’s time for these users to start using better cookbook recipes (which I and others are indeed busy developing, for example in my books and in articles such as this one.

      • Is it true that the epidemic of crappiest studies comes from areas where umm how do I put it, the results don’t matter that much? e.g. fertility and dress color, Bem’s ESP? Maybe there’s a reason why it’s a “Psycological Sci” type study and not say a “Polymer Chemistry” study?

        Well, OTOH we do have that Melanoma study and that’s a topic that does matter.

        I’m not sure.

        • Rahul:

          Yeah, I’m not sure either. On this blog we’ve discussed the pollution-in-China study and the early childhood intervention study, and both those topics are important. Also I’ve heard lots of bad things about medical trials. A big problem there, I assume, are the professional and financial incentives in favor of finding statistical significance.

        • Not in general, I think. Remember the Bayer report, where they found that something like two-thirds of drug effects could not be replicated.

          I think – and I hate to sound like a typical social scientist – that one of the reasons we don’t see these problems in polymer chemistry is that research in chemistry is “easy” – at least in the sense of research design. I think it was Rutherford who said that if your experiment is good enough, you don’t need statistics. Obviously that’s not true, but drawing inferences from controlled trials takes a lot less savvy than drawing inferences from observational data – and thus less possibility to mess it all up.

        • Andreas,
          I think there’s also less variation in (many areas of) chemistry — it might be smaller than measurement error.

          But my understanding is that in some areas of biochemistry/biophysics, it can be large — for example, when a cell divides, the “stuff” in the dividing cell isn’t evenly divided between the two daughter cells, which may result in quite different characteristics of the daughter cells. And several cell divisions later, there could be quite a large degree of variation between the descendent cells of the original cell.

        • Rahul,

          I don’t know about “crappiest”, the medical literature tends to be more “sophisticated” in making overblown claims. Read this ebola drug paper. They didn’t blind themselves, euthanized animals according to a “clinical score” that they do not describe in the paper, then claimed “ZMapp exceeds the efficacy of any other therapeutics described so far, and results warrant further development of this cocktail for clinical use.” Clearly we cannot tell whether the drug was effective or the researchers were biased in the “scoring”.
          http://go.nature.com/oY8pGI

          The p-values are meaningless, the reviewers/editor failed to make the authors include the method used to determine primary outcome, and the people we are being asked to believe made a gradeschool science fair methodological error. They subsequently got a bunch of funding for this: “The U.S. Department of Health and Human Services said today it would provide its expertise and as much as $42.3 million to help San Diego-based Mapp Biopharmaceutical accelerate development and testing of ZMapp, the biotech’s experimental Ebola drug.”
          http://www.xconomy.com/san-diego/2014/09/02/feds-provide-funding-expertise-to-advance-zmapp-drug-for-ebola/

          If you read carefully you will find such flawed studies make up the majority of the medical literature.

  3. In terms of marketing, what matters to people these days is not truth or falsehood, right or wrong, effectiveness or ineffectiveness, but status. So, market p values as low status, old-fashioned, okay for the masses who don’t know any better, and market Bayesianism as up-market, sophisticated, and a sure marker of elite status.

Leave a Reply

Your email address will not be published. Required fields are marked *