David Hogg pointed me to this post by Larry Wasserman:
1. The Horwitz-Thompson estimator satisfies the following condition: for every ,
where — the parameter space — is the set of all functions . (There are practical improvements to the Horwitz-Thompson estimator that we discussed in our earlier posts but we won’t revisit those here.)
2. A Bayes estimator requires a prior for . In general, if is not a function of then (1) will not hold. . . .
3. If you let be a function if , (1) still, in general, does not hold.
4. If you make a function if in just the right way, then (1) will hold. . . . There is nothing wrong with doing this, but in our opinion this is not in the spirit of Bayesian inference. . . .
7. This example is only meant to show that Bayesian estimators do not necessarily have good frequentist properties. This should not be surprising. There is no reason why we should in general expect a Bayesian method to have a frequentist property like (1).
Larry follows up with a sociological comment:
We are surprised by how defensive Bayesians are when we present this example. Consider the following (true) story.
One day, professor X showed LW an example where maximum likelihood does not do well. LW’s response was to shrug his shoulders and say: “that’s interesting. I won’t use maximum likelihood for that example.”
Professor X was surprised. He felt that by showing one example where maximum likelihood fails, he had discredited maximum likelihood. This is absurd. We use maximum likelihood when it works well and we don’t use maximum likelihood when it doesn’t work well.
When Bayesians see the Robins-Ritov example (or other similar examples) why don’t they just shrug their shoulders and say: “that’s interesting. I won’t use Bayesian inference for that example.” Some do. But some feel that if Bayes fails in one example then their whole world comes crashing down. This seems to us to be an over-reaction.
Here are my reactions to this story:
1. I don’t understand the mystique of the Horwitz Thompson estimator. Like all statistical procedures, sometimes it works well and sometimes it doesn’t. I agree with Larry that not every method works on every problem.
2. It’s fine that Larry’s favorite methods are used in biostatistics and at Google and Yahoo. I’ve heard that biostatisticians and software companies also use Bayesian methods, maximum likelihoods, chi-squared tests, etc. Lots of methods are useful. The fact that somebody somewhere uses a method doesn’t mean it’s optimal or even a good thing to do in general, but it provides some positive evidence.
3. I agree that there are cases where existing Bayesian methods have problems. Larry writes, “But some feel that if Bayes fails in one example then their whole world comes crashing down. This seems to us to be an over-reaction.” I would rephrase this to say: “Some feel that if Bayes fails in one example then it would be good to understand what aspects of the model are causing problems.” Sometimes the problem is that the frequentist criterion being used is not of applied relevance. Consider a simple problem such as estimating a proportion p, given y successes out of n trials, where n=100 and y=0. The best estimate of p will be different if I tell you that p is the probability of a rare disease, compared to if I tell you that p is the proportion of African Americans who plan to vote for Mitt Romney.
4. For some problems, Bayesians will give up. For example, Bayesians don’t want unbiased estimates, they don’t care about various minimax properties etc. And for some problems, classical statisticians give up (for example, giving a conf interval but not a point estimate in the y/n case). Some problems are essentially ill-posed (for example, the problem of estimating a ratio whose denominator could be either positive or negative, sometimes called the Fieller-Creasey problem).
To say this again: Bayesians give up on some things but not others. We’ll give up on theoretical principles (such as Larry’s (1) above) but we don’t like to give up on getting inferences for any quantity of interest. In contrast, non-Bayesians often feel strongly about principles such as unbiasedness and confidence coverage but are willing to give up on producing an estimate if a parameter is nonidentified.
So, I think Larry has identified a real difference in attitudes, but I don’t agree with his characterization that Bayesians think “their whole world comes crashing down.” We’re just more bothered by not being able to come up with an estimate of a probability, rather than not being able to satisfy a minimax property.