“Not only defended but also applied”: The perceived absurdity of Bayesian inference

Updated version of my paper with Xian:

The missionary zeal of many Bayesians of old has been matched, in the other direction, by an attitude among some theoreticians that Bayesian methods are absurd—not merely misguided but obviously wrong in principle. We consider several examples, beginning with Feller’s classic text on probability theory and continuing with more recent cases such as the perceived Bayesian nature of the so-called doomsday argument. We analyze in this note the intellectual background behind various misconceptions about Bayesian statistics, without aiming at a complete historical coverage of the reasons for this dismissal.

I love this stuff.

15 thoughts on ““Not only defended but also applied”: The perceived absurdity of Bayesian inference

  1. Very interesting.

    An technical note: Laplace was in fact (slightly) wrong about the flat prior as a representation of ‘complete’ ignorance. Jaynes has shown ( http://bayes.wustl.edu/etj/articles/prior.pdf ) that the prior that correctly represents ignorance is proportional to 1/(p(1-p)) – of course the two models converge as more samples are obtained. The flat prior actually amounts to an additional piece of information that the desired probability is neither 0 nor 1. The Jaynes prior also removes some paradoxical elements of Laplace’s solution.

    Recently I wrote about an equivalent of the sunrise problem, in ‘How to make ad hominem arguments’, though I stuck with the flat prior for simplicity. I also stuck with the assumed constant frequency, as Laplace did. I know this is an assumption, but surely it can be somewhat justified by the large body of experience that cause and effect relationships are ultimately constant. At the very least, as you point out in the article, we don’t have to dogmatically claim that it is true, we only avail of the opportunity to examine its consequences.

    • Kevin Van Horn argues (convincingly, IMO) that Jaynes’s argument is flawed here. I think the points listed in KVH’s third and fourth “Critique” paragraphs are particularly insightful.

      • Jaynes is great and his work has been a big influence on my philosophy and practice of model checking, but I don’t think he’s a guru. In particular, I don’t think it makes any sense to talk about a prior representing complete ignorance. A prior distribution is a model. There are settings when, combined with a data model, the prior can be weak, and settings where we can think of a prior as being neutral, but I think it’s meaningless to talk about “the prior that correctly represents ignorance.”

        • I’ve certainly never felt that Jaynes’ pronouncements are necessarily correct, I just found him to be a very sensible author on probability.

          I must admit to some confusion about your statement that it is meaningless to talk about a prior that correctly represents ignorance. It seems to me that every prior, and every posterior for that matter, is a representation of our ignorance. For any given model and any given state of knowledge, surely the assignment of a prior should aim to conform to some rational procedure. Surely any assignment that fails to do this is therefore incorrect.

        • Well no one is a guru in math. Everyone is merely sometimes right and sometimes wrong. But I wouldn’t dismiss Jaynes too quickly. Some points:

          (1) It’s useful in many fields to have a collection of thoroughly understood simple examples. For example, engineers have the six “simple machines” like the “pully”, “lever”, “screw” and so on. This understanding helps engineers create real world machines even though real devices are never “simple machines”.

          (2) Jaynes was not really trying to create “uninformative” priors. His real point was to show that the Uniform Prior used historically in this problem by both Bayes and Laplace did contain information!

          The uniform prior is appropriate when we know that both successes and failures are possible. This information is like already having 2 data points (one success and one failure). You can see this clearly when you compute the expected value for p and get the “rule of succession” result (r+1)/(n+2) instead of the more intuitive (r/n) which comes from the mysterious prior Tom mentioned.

          (3) The invariance argument Jaynes used to derive 1/(p*(1-p)) is easy for people to dismiss. Such derivations are not part of the standard toolset of statisticians and you probably won’t find any quite like this one anywhere. Van Horn, for example, seems to have been thoroughly confused as to why Jaynes was doing this.

          But after all the verbiage is stripped away, the math does lead to the right answer (r/n). There are probably many more successes like this one out there which haven’t been discovered yet. So my advice to anyone reading Jaynes is not to dismiss this half-formed example, but to rather to use it as the spark for a research project. Maybe one day your work will become part of the standard toolkit of Statisticians.

        • As Jouni discusses (in the paper linked to in my above comment), it’s not quite right that that a uniform prior on p “is like already having 2 data points (one success and one failure).” Such a statement already is assuming that the Beta(0,0) prior is like having 0 data points, which puts that Beta(0,0) as representing zero information. But, as we know, any prior encodes information.

          That said, I agree with you (and Jaynes) that it’s good to have sharply specified models and assumptions. Then when there’s a problem, we can try to understand where the assumptions broke down.

  2. Nicely done, very Laplacian (who obviously learned from his mistakes).

    As a general comment, the mistake of rejecting a hypothesis based solely on the way it was generated, was playing the major role in what you discuss. Which is what you seem to be arguing. (To me, the absurdity that relevant posterior probabilities were being assuredly obtained in any more than a few rare applications, being what was perceive as generating the hypothesis that Bayesian analysis would be helpful. And of course, those rare ones would be the ones where everyone uses Bayes anyways.)

    A particular comment on Keynes [perhaps to Xian], from the little of my reading his work, it seems that he was rightly concerned about the large and unavoidable probability model miss-specifications involved in making comparisons between non randomized groups to discern causality.

    He did blast K Pearson on this quite loudly. Strange that he was unaware of Peirce’s 1890s work [with a thanks to S Stigler on tracking that done] on using random assignment to avoid these miss-specifications. And I guess considering informative priors to directly address these in the absence of randomization seemed too insurmountable at the time as Don Rubin even suggested in the 1970’s (as unlikely to be practical).

    Also think a good editor and some revision to tighten up some of the arguments might make it a little less wrong ;-)

  3. Really nice indeed. But the fact that I liked it and that the statistics/Bayesian blogs that I like most are in fact this and Xian’s blog is not a coincidence!

  4. But it seems to me that you (Gelman) have been fearless in raising about the strongest criticisms of traditional Bayesian philosophy around–to your credit. Moreover you suggest it is a positive thing to raise these criticisms, so why be harsh with other critics of those same positions.

  5. that was fast. My point is that you don’t say something like ” I concur with their criticisms, and that is why I advocate an account that avoids those very points”. Bottom line: if one were to raise the criticisms you raise of Bayesian methods (e.g., in your RMM article, for example), it would be considered as mounting a fairly radical criticism of Bayesian principles, certainly in philosophy.

    • I completely disagree with Feller’s statement that Bayesian methods cannot be applied, and I don’t think the statement was true even in 1950.

  6. Is the position at issue not akin to Senn’s: “perfect in theory but not in application”? Recall his claim that he finds the “subjective Bayes theory extremely beautiful and seductive … The only problem with it is that it seems impossible to apply”. http://www.rmm-journal.de/htdocs/st01.html But I take you to also question the very idea that Bayesian updating is perfect in theory.

Comments are closed.