Skip to content
 

“Do statistical methods have an expiration date?” My talk at the University of Texas this Friday 2pm

Fri 6 Oct at the Seay Auditorium (room SEA 4.244):

Do statistical methods have an expiration date?

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University

There is a statistical crisis in science, particularly in psychology where many celebrated findings have failed to replicate, and where careful analysis has revealed that many celebrated research projects were dead on arrival in the sense of never having sufficiently accurate data to answer the questions they were attempting to resolve. The statistical methods which revolutionized science in the 1930s-1950s no longer seem to work in the 21st century. How can this be? It turns out that when effects are small and highly variable, the classical approach of black-box inference from randomized experiments or observational studies no longer works as advertised. We discuss the conceptual barriers that have allowed researchers to avoid confronting these issues, which arise not just in psychology but also in policy research, public health, and other fields. To do better, we recommend three steps: (a) designing studies based on a perspective of realism rather than gambling or hope, (b) higher quality data collection, and (c) data analysis that combines multiple sources of information.

Some of material in the talk appears in our recent papers, “The failure of null hypothesis significance testing when studying incremental changes, and what to do about it” and “Some natural solutions to the p-value communication problem—and why they won’t work.”

The talk will be in the psychology department but should be of interest to statisticians and quantitative researchers more generally. I was invited to come by the psychology Ph.D. students—that’s so cool!

28 Comments

  1. Chris says:

    Is the talk open to the public? The website doesn’t explicitly say.

  2. Tim says:

    I called the office and was told it should be general availability.

  3. Martha (Smith) says:

    Yes, it should be available to the public. SEA is at the northwest corner of Dean Keaton and Speedway. If you don’t have a UT parking permit, you can probably find a space in the Speedway Garage, about one block north of SEA. (See https://parking.utexas.edu/parking/garages/swg.php for instructions on payment.)

  4. Shravan says:

    Surely the title should be: “Do *some* statistical methods…”?

  5. Marcos says:

    When I read points (a), (b) and (c), I had a feeling of dispair. I dont think the problem has to do with statistics or p-value at all. I agree with pretty much everything you say, but those are statistical discussions, they are not key to the crisis in science. And somehow, it feels that blaming things like p-value is done because it is the politically correct thing to do, it is a way to ‘blame ourselves’ and ensure nobody gets upset.

    The problem is simply the system itself, it is the researchers doing their own analysis, the journals wanting to publish interesting things, the incentives to publishing. And most importantly, there is no reward for changing, just a perspective of punishment, a certainty that you will just stay behind if you change the way you do things.

    Nobody is interested in a, b, c, because we are in a competitive world and these practices will lead to fewer publications, which is unthinkable, it just cannot be done, itis just not acceptable. If you try to do it, you will immediately stay behind and you will not gain anything for that, you will be ostracized not admired.

    The system needs to change. Researchers should not do their own analysis, funding agencies should give funds and demand independent statisticians to analyze the data and these statisticians should be accountable for that analysis, not be co-authors. Publication in good journals should come only with pre-registered protocols, all research should be published once pre-registered, so that the idea/project is rewarded, not the publication. Replications should be of major importance, not frowned upon. Publications without pre-registered protocol should be enouraged but clearly labeled as exploratory, theory building so that the focus is not on the results but on the theory.

    Take point a. It is ‘gambling on hope’ that gives you more chance for publication, more promise of rewards. It is very hard to move away from it, because even if you were wrong regarding your ‘hope’, that does not mean you will not get a publication. To base design on realism and perspective is too hard.

    Most of those who do their onw analysis have little stats background. They will listen to you but will not understand it, and may question you: “What is wrong with what I am doing, I am being able to publish after all”. Those who understand go back to their desk and find that it is safer to just keep doing what they are doing. This only drives researchers away from statisticians: “they are scary, say things I dont understand, make my life more difficult. If I talk to a statistician chances are that I will not publish”. It is like telling people that they should not cross the street in mid-blocks. Some will not understand why the extra work. Some will understand but not find worth it – they will see those crossing mid-block getting to the other side faster, while they stay behind. Some will just think “who are you to tell me what to do”.

    If you work with researchers everyday, you will see bad practice everyday. You will try to teach them things that are simpler than points a, b, c. You will hold talks with simple example on how p-value fails so easily, how scientific method works and should be done. Next day they are still doing the same thing, still using p-value, and how could they not? That is what the system wants, and they need to publish.

    So, my view is that this discussion is a healthy, nice statistical discussion, but does not addresses the root of the problem in science at all. How is it that we can change the system, follow your points a, b, c, in a way that brings reward, not punishment for those who try it?

    • As the economists say: incentives matter.

      Sadly, since the publication of Death of a Salesman in 1949 the world has continued to “Simonize” the crap out of everything, deluding itself into thinking that “shiny” because it leads to “sales” is inherently what counts. Theranos and Juicero anyone?

      Since the era of honest to goodness snake oil (you know, bottles of pseudo-medicine) one of the easiest ways to make lots of money is to exploit asymmetry of information, if *you* know that what you’re selling is crap, but the world doesn’t it will buy that crap. It’s even BETTER for you if you steadfastly refuse to acknowledge the crap, just delude yourself, so that eventually if the crap catches up to you you can say “well, it’s what everyone was doing, and I didn’t know it was crap any more than anyone else”. This still seems to be a major mechanism in pharma, see the Tamiflu incident, or many of the cancer drugs out there that offer something like a few weeks of increased quantity of life with generally decreased quality, and at enormously high prices paid by third party insurance (to get truly rich, it also helps to exploit the principal-agent problem, moral hazard, and regulatory capture).

      In general, the more specialized the knowledge required to understand the problem, and the less knowledgeable the consumer is, the easier it is to sell the result.

      • You might argue that Theranos and Juicero are counterexamples, since they did in fact fail. But they are useful examples because they did fail and yet they sucked up major quantities of resources. Furthermore the point is that there are many resource-consuming entities out there who haven’t failed yet, but this doesn’t mean they are productive components of society, just that the asymmetry of information hasn’t yet been revealed.

        For a little light bedtime reading we can move out of the realm of power-posing, and into the realm of tens of billions of dollars per year in government funded bio-research:

        http://www.slate.com/articles/health_and_science/future_tense/2016/04/biomedicine_facing_a_worse_replication_crisis_than_the_one_plaguing_psychology.html

        • Andrew says:

          Daniel:

          There’s also “evidence-based design,” which seems like it could be a massive waste of resources.

        • Martha (Smith) says:

          Thanks for the link to the Slate article. It confirms (or even goes further than) what I have suspected about health research. It clearly makes the point that documentation is a crucial factor in science.

          I am just conjecturing here, but I am guessing that habits of documentation have slipped over the past decades (or perhaps over the last centuries?).

          Again, just conjecturing, but I wonder if development of good software for documentation could help — e.g. which records in large red letters something like NOT RECORDED if a step is skipped. (Of course, not all steps are relevant to all experiments, but something like this might contribute to an embarrassment factor that might be helpful.)

      • Anenoeuoid says:

        if *you* know that what you’re selling is crap, but the world doesn’t it will buy that crap. It’s even BETTER for you if you steadfastly refuse to acknowledge the crap, just delude yourself, so that eventually if the crap catches up to you you can say “well, it’s what everyone was doing, and I didn’t know it was crap any more than anyone else”.

        Perfectly said. This is exactly what I saw. The less you understood what you were doing, the more rewards you would get (paradoxically, the except is outright fraud; the more you understand, the easier that becomes). If people want things like a cure for cancer, they need to reward researchers on that topic doing a “good job”.

        A good job would be carefully recording useful data (longitudinal studies, dose-response, etc…not data about there being differences between groups you made different) along with the methods used to generate it, and comimg up with quantitative models that make precise predictions to compare to said data. The current system is an active obstacle to doing that.

        • Unfortunately, it goes well beyond bio-medical research. A huge quantity of what is being done today in the tech industry is similar, and it’s not just about p values and NHST, that’s just a technology for producing crap on the cheap in bulk. The deeper issue is disconnection between production of “real value” and the making of cash income.

          Economists of course tend to think like “a thing is worth what the consumer will pay for it, there is no ‘real value’…” but I’d much rather have that be “a thing is worth what the consumer would pay for it, if they knew *exactly* what they were getting”.

          Saying that “snake oil bottles are worth $300 a jar because that’s what people will pay for them” doesn’t capture “what would people pay for them if they knew that they were manufactured by pouring some soybean oil into a dark bottle with a dropper and then adding a few drops of parsley extract to make you think they’re kind of medicinal, and that the whole thing can be made in your own kitchen for $1.25 including the bottle and dropper, and will be identical to what you bought for $300 and has no medical value whatsoever?”

          How many times have you purchased a consumer electronics product that didn’t do what it advertised? required a couple months wait and then a firmware upgrade? technically worked, but burned out after a short useful life due to failed capacitors or power supply issues, or even just bought a perfectly fine product such as a smart-phone but it was locked down purposefully in a way that prevented you from software upgrading it to extend its useful life so that you had to buy an upgrade?

          How many of us would really be using Facebook, Twitter, or Uber if we knew what was going on behind the scenes there in detail?

          These are just everyday examples of things that consumers who normally know nothing about this stuff are deluded into thinking that they’re getting one thing, but in fact they get another. Bio-medicine is way way worse because at least some consumers get to be experts in consumer electronics and can help their friends avoid the cheap junk or explain what can and what can’t really be done (no, you won’t get 1700 Mbps data transfer over your new router… that’s the theoretical maximum you could get for simultaneous transmissions to multiple computers at once on the one 2.3Ghz band and the separate 5Ghz band radios and includes all the protocol overhead… actual speed in a single file transfer from your computer under realistic conditions, would be closer to 100Mbps, or the like)

          But how many people are in a position to say “no, your doctor can’t cure X by giving you medicine Y, for a few weeks you will feel better, but after a while medicine Y will have side effects that are very unpleasant and eventually worse than the ailment, leading to a second chronic condition on top of X, and require you to then follow up with surgery and this often goes wrong, and after several years, you’ll die of a bleeding ulcer but only after being incapacitated and unable to work, on the other hand, your insurance company will wind up shelling out a total of $250,000 on your behalf. Or you could just put up with disease X and live longer and ultimately probably be happier.

          ???

          It doesn’t happen. Instead what happens is like what happened to my grandfather: at age 92 his cardiologist breezed into his hospital room, declared that he needed an ablation surgery, breezed out, performed the surgery the next day, billed the insurance company $108,000 and 4 days later he died of heart failure as everyone in our family knew he would, it was obvious to everyone.

          The core problem is asymmetry of information. Bad science is just one of many consequences.

      • Allan Cousins says:

        Daniel:

        I think your example of Theranos is interesting because the informational asymmetries cut across different aspects of that term. As far as I understand it, FDA approvals or applications (or lack thereof) are in the public domain. If my understanding is correct (I have never worked with anything FDA related, so mountains of salt required) than any investor or end-user presumably could have investigated with relative ease if the product they were using or investing in was approved or not; and made a judgement on that basis of how to proceed. So in some sense the pertinent information – whether Theranos was providing a legitimate service at the time – was widely available, to anyone. It wasn’t so much lack of disseminated information that allowed it to flourish; it was the lack of asking the right/fundamental questions because everyone got caught up in the story.

        I think the example highlights that the concept of informational asymmetry is diverse. It’s my impression that people tend to think of informational asymmetry only in the strict sense where one side of the transaction really does have pertinent information that the other side does not. But Theranos is a good example where high level information was available but people just didn’t look.

    • Keith O'Rourke says:

      Marcos:

      I think you have provided a nice overview of the real insurmountable opportunity.

      However, we should not overlook that there likely has been a subset of researchers that have been on the high road, sometimes because of good fortune (e.g. my former director that never had to worry about funding or grants from early on in their career) and sometimes being pushed out of academia but still being able to contribute, though most posthumously (e.g. CS Peirce).

      Additionally, things are starting to look up – even if just slightly – now that far more are aware and concerned about faulty funded, published and highly rewarded research and its seemingly unending escalation.

      For instance, one university actually does random checks to see if the ethics protocols are being followed – that includes looking at how data is handled and stored (minimal quality control, but a start). Apparently there was incredible faculty push back on it [yup that is a really really bad] but as the university could argue it was the only way they could ensure that the human subjects requirements were being met – that the university’s research was ethical – the push back could be largely ignored.

      Now, how much of a stretch would it be to also audit how the data is managed and analyzed? They would need different expertise for this – and that’s the expertise Andrew is addressing here. It needs to be better sorted out in the statistics discipline as its currently a mess (especially if one includes intro stats courses as being part of the statistics discipline!).

      I also think what will be in the random checks needs to be divided out into economy of research assessments and scientific assessments. The economy of research assessments include things like did they adequately report what they did, was the data management adequate, did they follow their protocol or indicate clearly changes they made, etc – which are audit-able. The scientific assessments on the other hand will not be audit-able per se but will require mature and critical peer review. What is likely most certain here is this will not be popular with most faculty!

      On the other hand, if some university can start down this path and make some progress, they likely will have an advantage in attracting faculty that want to travel the high road as well as funders who want their funds to actually make a difference.

      But without a better sorting out of how to peer review how real uncertainties are dealt with in studies (e.g. expiry dates for statistical methods) no university will get very far down that path. Hey, even placebos have expiry dates!

    • a reader says:

      Marcos:

      “Researchers should not do their own analysis, funding agencies should give funds and demand independent statisticians to analyze the data and these statisticians should be accountable for that analysis, not be co-authors.”

      I’ve held the position of statistical consultant and my experiences have lead me to believe this is not the best direction to go. The reason I say that is if a researcher does not have a strong working concept of how the actual mathematical model that describes the relations they are interested in work, they will often collect very low quality data.

      In the abstract, it might very clear that two phenomena are related; i.e. “People are nicer to more attractive people”. The really hard question is what data would you possibly record to quantify this? My experience has been that unless the researcher is intimately familiar with the model they would like to use, they will just collect a whole bunch of data that they think should be related to their abstract concepts (symmetry of face + weight + height + number of hours spent at the gym and number of times someone held the door + number of raises + average number of conversations per day) and pass the data to a statistician. In my very biased personal opinion, this really passes the scientific work to the statistician, as they are supposed to now come up with the mathematical model that gives insight into their abstract question of interest…even though the statistician is often not an expert in that particular field.

      This does not lead to good science. In fact, I think my biggest contributions as a statistical consultant has been helping researchers decide what data to collect, but only after I’ve become extremely familiar with the problem.

      Alternatively, there is the RCT model: the group collecting the data defines an extremely precise plan for exactly how the data will be analyzed when it comes in, but this plan is actually executed by an independent group. Then the group collecting the data is intimately tied to the model they will be using, but does not relieve them of their statistical sophistication requirements.

      • The p value can be thought of as “regulatory capture” of peer review. Instead of people actually evaluating a scientific argument in terms of the quality of the science in a holistic logical framework, people wanted to foil this “full regulation” with something mechanical and gameable. We replace the holistic logical framework with a purely mechanical requirement that somehow by putting your data into a canned software routine of a certain kind you can make a number spit out which is less than 0.05

        The existence of this procedure is then taken as a means to short circuit any regulatory process whereby people actually think strongly about the scientific question and now we are able to churn out snake oil without worry that the regulatory system will slow us down.

      • Ben Prytherch says:

        “In my very biased personal opinion, this really passes the scientific work to the statistician, as they are supposed to now come up with the mathematical model that gives insight into their abstract question of interest…even though the statistician is often not an expert in that particular field.”

        Yes. I don’t do consulting in any official capacity (I’m a teacher), but on the occasions that someone comes to me for help with statistics, it’s usually along the lines of “Here’s a pile of data. Are any of these variables related?” That or they need assistance attaching the correct number of stars to their p-values.

      • Keith O'Rourke says:

        I think it can work very well if the statistical consultant works along with domain experts also detached from the particular research.

        That pretty much was my case at The Toronto Hospital in that my director was senior enough that I had regular access to clinician I could consult with and was comfortable reviewing even Vice President’s research projects without worrying what they thought.

        In fact, that is were I encountered a row slip in an Excel file that totally negated a previous yet to be published research finding. “After a few difficult meetings, the researchers agreed to do double data entry, and it was discovered that a row slip had occurred and it was the next admitted patient’s measure that predicted the last admitted patient’s mortality.” Andrew and I mentioned that here http://www.stat.columbia.edu/~gelman/research/published/GelmanORourkeBiostatistics.pdf

        I doubt if its practical on a wide scale.

      • Martha (Smith) says:

        Keith said: “In fact, I think my biggest contributions as a statistical consultant has been helping researchers decide what data to collect, but only after I’ve become extremely familiar with the problem.”

        This is one of the most important types of contribution a statistical consultant can make!

  6. Anonymous says:

    I hope Andrew doesn’t offend anyone in his University of Texas audience who is passionate about p-value <.05:

    "On June 1, 2015, Gov. Greg Abbott signed S.B. 11, also known as the "campus carry" law. S.B. 11 provides that license holders may carry a concealed handgun throughout university campuses, starting Aug. 1, 2016. The law gives public universities some discretion to regulate campus carry."

    From https://campuscarry.utexas.edu/campus-carry-update written by Bob Harkins
    Associate Vice President, Campus Safety & Security
    Chair, Campus Carry Implementation Task Force:

    "The open carry of handguns is not allowed on the campus. Therefore, if you ever see a gun, do not attempt to question or approach the carrier, but immediately CALL 911. Police are trained to handle this situation.
    There are several areas of campus in which the concealed carry of handguns is prohibited, including some portions of residence halls. Individuals with a license to carry are responsible for knowing the locations that exclude concealed handguns and to plan their daily activities carefully.
    License holders must carry their handguns on or about their person at all times while on campus. "About" the person means that a license holder may carry a handgun – holstered – in a backpack or handbag, but the backpack or handbag must be close enough that the license holder can grasp it without materially changing position. The holster must completely cover the trigger area and have enough tension to keep the gun in place if jostled.
    All license holders must think through the activities of their day. There may be times when the consequence of your activities may preclude carrying on a given day. For example, if you are going to a Rec Sports area, have a class that requires role playing, rolling, or spinning or contact with other students, you might expose the handgun. Remember that there is no storage on campus except in a privately owned vehicle.
    Many people on campus have strong viewpoints about the new law, and we understand those passions. However, regardless of your opinion about the legislation, you are required to follow the laws laid out by the State of Texas, and the policies of The University of Texas at Austin. We also ask that everyone show respect toward other members of the university community who have different views."

  7. Vishv Jeet says:

    Austin is an awesome place especially during this time of the year. Beware of guns on campus.

  8. Hey Andrew! I’m going to be attending your talk tomorrow and would like to record it so some friends can also see it – they have other meetings at that time.

    Would you mind if I recorded the talk?

    -Matt

Leave a Reply