Psychology researcher Gary Marcus points me to this comment he posted regarding popular representations of Bayesian and non-Bayesian statistics. Gary guessed that I’d disagree with him, but I actually thought that what he wrote was pretty reasonable. (Or maybe that’s just my disagreeable nature, that in this case I show my contrarian nature by agreeing when I’m not supposed to!)
Here’s what Marcus wrote:
[In his recent book, Nate] Silver’s one misstep comes in his advocacy of an approach known as Bayesian inference. . . . Silver’s discussion of alternatives to the Bayesian approach is dismissive, incomplete, and misleading. . . .
A Bayesian approach is particularly useful when predicting outcome probabilities in cases where one has strong prior knowledge of a situation. . . . But the Bayesian approach is much less helpful when there is no consensus about what the prior probabilities should be. For example, in a notorious series of experiments, Stanley Milgram showed that many people would torture a victim if they were told that it was for the good of science. Before these experiments were carried out, should these results have been assigned a low prior (because no one would suppose that they themselves would do this) or a high prior (because we know that people accept authority)? In actual practice, the method of evaluation most scientists use most of the time is a variant of a technique proposed by the statistician Ronald Fisher in the early 1900s. Roughly speaking, in this approach, a hypothesis is considered validated by data only if the data pass a test that would be failed ninety-five or ninety-nine per cent of the time if the data were generated randomly. The advantage of Fisher’s approach (which is by no means perfect) is that to some degree it sidesteps the problem of estimating priors where no sufficient advance information exists. In the vast majority of scientific papers, Fisher’s statistics (and more sophisticated statistics in that tradition) are used. . . .
In any study, there is some small chance of a false positive; if you do a lot of experiments, you will eventually get a lot of false positive results (even putting aside self-deception, biases toward reporting positive results, and outright fraud)—as Silver himself actually explains two pages earlier. Switching to a Bayesian method of evaluating statistics will not fix the underlying problems; cleaning up science requires changes to the way in which scientific research is done and evaluated, not just a new formula.
It is perfectly reasonable for Silver to prefer the Bayesian approach—the field has remained split for nearly a century, with each side having its own arguments, innovations, and work-arounds—but the case for preferring Bayes to Fisher is far weaker than Silver lets on, and there is no reason whatsoever to think that a Bayesian approach is a “think differently” revolution.
This was similar to the comment of Deborah Mayo, who felt that Nate was too casually identifying Bayes with all the good things in the statistical world while not being aware of modern developments in non-Bayesian statistics. (Larry Wasserman went even further and characterized Nate as a “frequentist” based on Nate’s respect for calibration, but I think that in that case Larry was missing the point, because calibration is actually central to Bayesian inference and decision making; see chapter 1 of Bayesian Data Analysis or various textbooks on decision analysis).
I pretty much agreed with Marcus’s and Mayo’s general points. Bayesian and non-Bayesian approaches both can get the job done (see footnote 1 of this article for my definitive statement on that topic).
I do, however, dispute a couple of Marcus’s points:
1. The paragraph about the Milgram experiments. From what I’ve read about the experiment (although I have to admit not being an expert in that field), Milgram’s data are so strong that the prior distribution would be pretty much irrelevant. The main concern would be potential biases in the experiment, generalizability to other settings, etc.—and for those problems, you pretty much have to use an assumption-based (whether or not formally Bayesian) approach (as discussed, for example, in the writings of Sander Greenland). All the hypothesis testing and randomization in the world won’t address the validity problem.
2. Marcus’s statement, “In any study, there is some small chance of a false positive.” I get what he means, and I think the general impression he’s giving is fine, but I disagree with the statement as written. Some experiments are definitely studying real effects, in which case a “false positive” is impossible.
3. Marcus writes, “there is no reason whatsoever to think that a Bayesian approach is a ‘think differently’ revolution.” I think “no reason whatsoever” is a bit strong! For some statisticians, it can truly be revolutionary to allow the use of external information rather than get trapped in the world of p-values. I agree that, if you’re already using sophisticated non-Bayesian methods such as those of Tibshirani, Efron, and others, that Bayes is more of an option than a revolution. But if you’re coming out of a pure hypothesis testing training, then Bayes can be a true revelation. I think that is one reason that many methodologists in your own field (psychology) are such avid Bayesians: they find the openness and the directness of the Bayesian approach to be so liberating.
Despite these points of disagreement (and my items 2 and 3 are matters of emphasis more than anything else), I agree strongly with Marcus’s general message that Bayes is not magic. The key step is to abandon rigid textbook thinking on hypothesis testing and confidence intervals; one can move forward from there using Bayesian methods or various non-Bayesian ideas of regularization, meta-analysis, etc. I have not read Nate’s book but if Nate’s message is that modern statistics is about models rather than p-values, I support that message even if it’s not phrased in the most technically correct manner. And I also support Marcus’s message that it’s not so much about the word “Bayes” as about escaping out-of-date rigid statistical ideas.