Why I don’t like so-called Bayesian hypothesis testing

I received the following email:

As a psychologist teaching and using Bayesian statistics, I’ve been pleased to see some of my colleagues endorsing Bayesian data analysis. But I’ve been very chagrined to see them champion Bayes factors for null-hypothesis testing, instead of parameter estimation. My question is simple: Are there any articles that head-on challenge the Bayes-factor approach to null-hypothesis testing, and instead favor parameter estimation?

Perhaps the most straight-forward example against Bayes factors for null hypotheses was given by Stone (1997), Statistics and Computing, 7, 263-264. He showed a simple case in which the BF prefers the null but the estimated posterior excludes the null value. I realize that the two approaches are asking different questions — I’ve just never really been convinced that the answer provided by a null/alternative comparison really tells us anything we want to know, because no matter what it says, I always want to do the estimation anyway.

My reply: You won’t be surprised that I agree with the above perspective. Here’s my article with Rubin (from Sociological Methodology 1995) where we bang on Bayes factors for 8 straight pages. I really like this article.

11 thoughts on “Why I don’t like so-called Bayesian hypothesis testing

  1. In response to the question of comparing Bayes factors to parameter estimation, I'd like to mention a paper Valen Johnson and I wrote: http://www.bepress.com/mdandersonbiostat/paper47/

    The clinical trial monitoring method presented in that paper uses Bayesian hypothesis testing and has better operating characteristics than the estimation-based methods we compared it to.

    Clinical trials really are hypothesis tests: at the end of the trial, you must make a yes/no decision whether to investigate the agent further. Maybe your objection concerns situations where this is not the case and parameter estimation is more appropriate.

  2. Interesting – I have to admit that I am not a Bayesian by training, but I remember when I first was exposed to a book on Bayesian Model Selection by Herbert Hoijtnik and Irene Klugkist from the University of Utrecht and I remember thinking that this was really neat stuff… the way they showed how you can evaluate hypotheses that are much more elaborate than the classic null that nothing or something is going on and the way you calculate Bayes Factors for each of these competing hypotheses that are consistent with alternative theories and such. I guess I just need to educate myself more on Bayesian analysis.

  3. The article you posted looks interesting, but sadly rather hard to read for someone outside the field without the original paper by Raftery which does not seem to be on the internet….

    Shocking that in the 21st century one almost *expects* for everything to be accessible in a few clicks, but also the point stands. If anyone has a link to the original, I would much appreciate it.

  4. I'm not sure the article you put up really answers the original email. The original email asked about Bayes factors (not BIC) and why you would want to do model comparison rather than just parameter estimation. I suggest taking a look at this paper:

    http://www.stat.duke.edu/~berger/papers/ockham.ht

    where the authors use Bayes factors to (historically) compare two hypotheses: 1) general theory of relativity and 2) a planet Vulcan orbiting closer to the sun than Mercury. In this example, hypotheses 1 has no parameters to estimate and therefore parameter estimation would not work.

    Of course, this does not give a full answer to the original email, but at least provides an example where you cannot simply do parameter estimation.

  5. Does "So-called" modify "Bayesian" or "hypothesis testing"?

    As another psychologist, my point of view is that frequentist hypothesis testing is inferior to Bayesian hypothesis test. There is no way to keep psychologists from using hypothesis testing (although, I hope to very soon submit a manuscript without a single hypothesis test), so we may as well pick the better poison.

    Bayesian hypothesis testing is the better poison. On this, I love Sellke, Bayarri, and Berger's (2001, AmStat) demonstration.

  6. The original email asked about Bayes factors (not BIC)

    If I am remembering correctly, the BIC is a (non-linear) transformation of (an approximation to) the Bayes factor.

    Does "So-called" modify "Bayesian" or "hypothesis testing"?

    It's [ so-called [ Bayesian [ hypothesis testing ]]], i.e., 'so-called' modifies 'Bayesian hypothesis testing'.

  7. Bayesian hypothesis tests are the work of Harold Jeffreys who realised that you could not proceed using vague priors for parameters unless you have a means of choosing between simpler and more complex models. Also he was keen to find ways of proving that scientific laws are true. If you think you can do this and want to do it you need Bayesian hypothesis tests.
    I, personally, don't like Jeffreys's approach. However, I think that it is a tribute to his genius that he realised that such a system had to be part and parcel of any attempt to be semi-objective in the use of Bayes. Unfortunately we now have many so-called Bayesians who think they can use uninformative priors without a system of deciding between simpler and more complex hypotheses. This is not possible.
    The issue is discussed in chapter 4 of my book Dicing with Death

  8. Andrew,

    A very nice article indeed, paticularly the theoretical criticisms of BIC. Your conclusion reads like a rather decisive referee report :)

    (Thanks Ahuri for posting the original)

  9. Below are two "fresh off the press" articles promoting Bayesian null-hypothesis testing in psychological research. Interesting approaches, but in all the cases discussed by these authors, I'm not convinced that either the null prior or any of the alternative priors have much prior credibility. Hence the Bayes factor only indicates which unbelievable prior is most unbelievable. I'd be more comfortable, and more informed, by a parameter estimation. Anybody have comments?

    The importance of proving the null.
    C. R. Gallistel.
    Psychological Review, 2009, 116 (2), 439-453.
    DOI: 10.1037/a0015251

    Bayesian t tests for accepting and rejecting the null hypothesis.
    Jeffrey N. Rouder, Paul L. Speckman, Dongchu Sun, Richard D. Morey, and Geoffrey Iverson.
    Psychonomic Bulletin & Review, 2009, 16 (2), 225-237
    DOI: 10.3758/PBR.16.2.225

  10. In our recent paper, Murray Aitkin and I discuss the increasingly popular and uncritical use of Bayes factors in psychology (we also quote Gelman & Rubin!)

    Liu, C. C., & Aitkin, M. (2008). Bayes factors: prior sensitivity and model generalizability. Journal of Mathematical Psychology, 52, 362-375.
    doi:10.1016/j.jmp.2008.03.002

    Personally, I am in favor of hypothesis testing/model selection methods that are consistent with parameter estimation.

Comments are closed.