“The Statistical Crisis in Science”: My talk in the psychology department Monday 17 Nov at noon

Posted on November 14, 2014 9:14 AM by Andrew

Monday 17 Nov at 12:10pm in Schermerhorn room 200B, Columbia University:

Top journals in psychology routinely publish ridiculous, scientifically implausible claims, justified based on “p < 0.05.” And this in turn calls into question all sorts of more plausible, but not necessarily true, claims, that are supported by this same sort of evidence. To put it another way: we can all laugh at studies of ESP, or ovulation and voting, but what about MRI studies of political attitudes, or embodied cognition, or stereotype threat, or, for that matter, the latest potential cancer cure? If we can’t trust p-values, does experimental science involving human variation just have to start over? And what to we do in fields such as political science and economics, where preregistered replication can be difficult or impossible? Can Bayesian inference supply a solution? Maybe. These are not easy problems, but they’re important problems.

Here are the slides (which might be hard to follow without hearing the talk) and here is some suggested reading:

Non-technical:

Too Good to Be True

The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time

Slightly technical:

Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors

The Connection Between Varying Treatment Effects and the Crisis of Unreplicable Research: A Bayesian Perspective

16 thoughts on ““The Statistical Crisis in Science”: My talk in the psychology department Monday 17 Nov at noon”

Jonathan (another one) on November 14, 2014 11:20 AM at 11:20 am said:

Open to the public?

Reply ↓
- Andrew on November 14, 2014 11:26 AM at 11:26 am said:
  
  Yup, I think so.
  
  Reply ↓
question on November 14, 2014 11:57 AM at 11:57 am said:

“In the sciences, an experimentum crucis (English: crucial experiment or critical experiment) is an experiment capable of decisively determining whether or not a particular hypothesis or theory is superior to all other hypotheses or theories whose acceptance is currently widespread in the scientific community.”
https://en.wikipedia.org/wiki/Experimentum_crucis

In most of these “significant p-value -> my theory is true” cases there is no real “prediction” of the “theory” in the sense that the prediction may be distinct from those of many other theories. It is just that no one has bothered to come up with an alternative explanation for why the two means aren’t equal, etc. Further, all the averaging has hidden the information that may be inconsistent with the “theory”, creating an obstacle to others abducing alternative explanations.

Reply ↓
Jason Thomas on November 14, 2014 3:08 PM at 3:08 pm said:

I really wish I could attend this talk because it sounds amazing, but I’m not going to be in NYC. I also really wish U.S. universities made the effort that the British universities did to make sessions open (Google: LSE podcasts for the most gleaming example). Open access to ideas is just as powerful as open access to data, reproducibility and open science

Reply ↓
- Rahul on November 15, 2014 2:36 AM at 2:36 am said:
  
  I’m skeptical that US universities are behind British ones about openness in any systematic sense.
  
  Reply ↓
  - Jason Thomas on November 16, 2014 1:35 PM at 1:35 pm said:
    
    Consider your skepticism smoothed. Putting undergraduate courses that are reduced in content from the originals on Open Courseware, Coursera or EdX or anchored short interview podcasts like HBS Ideacast is not the same as actually making publicly available the talks, panels and debates (and data) where ideas are debated amongst experts and the boundaries of ideas probed. No U.S. university that I know of does this, other than Chicago Law. It’s hard to argue that there is to value to be had by someone plugging an $80 digital recorder into the microphone for this talk and making it available online.
    
    https://soundcloud.com/lsepodcasts
    
    http://podcasts.ox.ac.uk
    
    http://sms.cam.ac.uk
    
    Reply ↓
    - Rahul on November 16, 2014 2:58 PM at 2:58 pm said:
      
      “MIT OpenCourseWare (OCW) is a web-based publication of *virtually all* MIT course content. OCW is open and available to the world and is a permanent MIT activity.”
Rahul on November 15, 2014 2:38 AM at 2:38 am said:

” If we can’t trust p-values, does experimental science involving human variation just have to start over?”

Isn’t the qualifier about human variation redundant? If we cannot trust p-values we cannot trust p-values.

Reply ↓
- Andrew on November 17, 2014 8:11 AM at 8:11 am said:
  
  Rahul:
  
  At a technical level, a lot of the problems arise when signal is low and noise is high. Various classical methods of statistical inference perform a lot better in settings with clean data. Recall that Fisher, Yates, etc., developed their p-value-based methods in the context of controlled experiments in agriculture.
  
  Statistics really is more difficult with humans: it’s harder to do experimentation, outcomes of interest are noisy, there’s noncompliance, missing data, and experimental subjects who can try to figure out what you’re doing and alter their responses correspondingly.
  
  Reply ↓
Kyle C on November 15, 2014 9:14 PM at 9:14 pm said:

How was it received?

Reply ↓
Peter chapman on November 17, 2014 4:55 AM at 4:55 am said:

I’m not going to defend p-values but it is (some) psychologists and the way they apply statistical methods that we shouldn’t trust. Moving over to Bayesian methods will not solve this problem.

Reply ↓
- Rahul on November 17, 2014 5:47 AM at 5:47 am said:
  
  +1
  
  A lot of this boils down to intent. If one really *wants* to push a certain result there’s always going to be a way to do it.
  
  Reply ↓
- Andrew on November 17, 2014 8:08 AM at 8:08 am said:
  
  Peter, Rahul:
  
  1. I agree with you that it’s not just about p-values. The way I (and others put it), the problem is with null hypothesis significance testing, not with p-values. Null hypothesis significance testing can be done using Bayesian methods and the same problems will arise there as arise with classical p-values.
  
  To put it another way, there are problems where null hypothesis significance testing is appropriate, but I think these problems are rare, and I think the application of null hypothesis significance testing in science is generally misguided. And if all the p-values were changed to Bayes factors, I’d still feel this way.
  
  I discussed these issues a bit here.
  
  2. Rahul wrote, “If one really *wants* to push a certain result there’s always going to be a way to do it.” Sure, but it’s more than that. As Loken and I discuss in our Garden of Forking Paths article, a lot of these problems can arise even when researchers aren’t trying to cheat; it just comes up in analysis choices that are contingent on data.
  
  Reply ↓
Pingback: This is what “power = .06” looks like. Get used to it. - Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science
Pingback: Overegging it | Stats Chat
Pingback: In the mind’s ear: No connection between hearing and speaking in motor cortex – PS Featured Content

16 thoughts on ““The Statistical Crisis in Science”: My talk in the psychology department Monday 17 Nov at noon”

Leave a Reply Cancel reply