The Battle of the Bayes

Larry Wasserman read my response to his comments on my article on Bayesian statistics and had some responses of his own. I’ll post Larry’s thoughts and then my response to his response etc. Here’s Larry:

1. You correctly point out that if there are systematic errors, then confidence intervals will not have their advertised coverage. But this has nothing to do with the point under discussion: Bayes versus frequentist. All methods fail if there are systematic error. That’s important but it is beside the point I was making. Assume the model is correct and there is no systematic error. Then what I said is correct. The frequentist method will cover as advertised (by definition) and the Bayesian method, in general will not. More importantly, and this is what I was really getting at, the hardcore subjectivist Bayesian will say that coverage is irrelevant. The people who think that coverage is completely irrelevant are being scientifically irresponsible in my opinion.

2. “I can dispose of the first two with a reference to Agresti and Coull (1998).” Frequentist methods have correct coverage or they aren’t frequentist methods. That’s the definition. And when I said “Bayesian methods don’t” I meant, there is nothing in Bayesian inference that automatically guarantees, in general, correct coverage. I did not mean there there don’t exist Bayesian methods with correct coverage. The fact that (y+1)/(n+2) has good frequentist properties is fine but in this case we’re thinking of it as a frequentist estimator. The fact that it happens to be Bayes for some prior isn’t what I mean by a Bayesian procedure.

3. You’re right that scientists want it both ways: they want coverage AND they would like to interpret a confidence interval as a posterior probability. So what. I’d like to measure an electron’s position and its velocity. But I can’t. Physics tells us you can measure one or the other but not both. Tough luck for me. If you explain to a scientist that they can have the comfort of a Bayesian interpretation OR coverage bot not both, I’ll bet most would pick coverage.

4. Estimating the upper .01 quantile. You say there is no frequentist method for this. First of all, there is (as I discuss below). Second, here is a challenge. Find me one example where:

(i) there is no frequentist method to solve a problem

(ii) there is a Bayesian method

(iii) we can trust the Bayesian method.

I claim you cannot do this. Suppose you do find a Bayes procedure. If it has coverage then you have found a valid frequentist method so (i) is not true. If it does not have coverage then you have failed (iii).

But lets look closer at estimating theta = .01 quantile. We can always find the order statistics X_r and X_s so that:

P(X_r < theta < X_s) >= .95

This is an exact, nonparametric statement. So [X_r,X_s] is a valid 95 percent confidence interval. When you say there is no frequentist method, I suspect you mean that this interval is going to be very wide, unless the sample size n is very large. My reply is: great! There is very little information in the data about theta unless n is large. So the interval should be wide. This a a correct representation about the uncertainty. The Bayesian interval will be narrower but this reflects the prior not the data. Of course, if the prior is reliable that’s fine. But many Bayesians would simply crank out an interval, see that it is narrower than the frequentist interval and declare victory. They’re deluding themselves because they’re sweeping the uncertainty under the carpet. I’d rather they use the frequentist interval so that they are aware of the difficulty of the problem.

The same applies to calibration problems etc where one gets huge intervals (sometimes even the whole real line). This is a virtue not a problem. The Bayesian answer hides the problem.

And here’s my reply, which I’ll divide into two parts: Points of Agreement, and Points of Disagreement.

Points of Disagreement

1. In Larry’s article, he wrote:

The particle physicists have left a trail of such confidence intervals in their wake. Many of these parameters will eventually be known (that is, measured to great precision). Someday we can count how many of their intervals trapped the true parameter values and assess the coverage. The 95 percent frequentist intervals will live up to their advertised coverage claims.

My point was that, based on the historical record, physicists’ intervals have not lived up to their advertised coverage. I think Larry’s chasing a dream here. I don’t think I’m “scientifically irresponsible” for not caring about frequentist coverage, and I don’t think the physicists of the past were irresponsible either. They understood, as do I, that these intervals are conditional on the models being used.

2. We seem to be in agreement on the method of Agresti and Coull (1998). I call it a Bayesian procedure, Larry calls it a frequentist estimator. I’m happy with him calling it anything he likes as long as he uses it!

3. Scientists like a good statistician. I have no doubt that Larry is useful to his scientific colleagues and that, in working with him, they will have justified faith in his methods and principles. I also have no doubt that the scientists with whom Don Rubin works have faith in his methods and would prefer the Bayesian interpretation. There are many roads to Rome, and I don’t think that the statement “Larry’s colleagues prefer confidence coverage” should warn me off of Bayes, any more than the statement “Don’s colleagues prefer probability inference” should warn me off of frequentist ideas.

4. I do not say that there is no frequentist method for estimating the upper .01 quantile. What I wrote was, “the empirical distribution might not work so well for estimating the mean or the upper 1% of the distribution.” I was commenting on the discussion of the emprical distribution in Section 3 of Larry’s article. There are lots of ways of estimating the upper .01 quantile, but if I’m setting insurance rates and I don’t have hundreds or thousands of replications, I’d prefer to use a probability model.

I don’t think a Bayesian method is “sweeping the uncertainty under the carpet.” Quite the opposite! I’m stating my assumptions explicitly as probability statements, and scientists are free to challenge and improve upon them.

Larry’s methods make huge assumptions too. For example, what is he saying when he says, “Suppose we observe X1,…,Xn from a distribution F.” This is saying that you have no information to distinguish these n historical cases from the future cases for which you’d like to apply the model. In the problems I’ve worked on, it’s rare to have that sort of perfect replication.

Points of Agreement

As noted in the last section of my rejoinder, I see the appeal of methods that avoid distributional assumptions–I’ve used such methods in my own work and can see how they can be useful to others.

In his notes above, Larry didn’t comment on what I said about randomization, so I assume that he would retract the statement in his article that randomized experiments “don’t really have a place in Bayesian inference.”

8 thoughts on “The Battle of the Bayes

  1. "You correctly point out that if there are systematic errors, then confidence intervals will not have their advertised coverage. But this has nothing to do with the point under discussion: Bayes versus frequentist. All methods fail if there are systematic error."

    In practice there are _always_ systematic errors. In the soft sciences and fields like psychology and medicine, there can be _very large_ systematic errors. For example, one of the most common error: it is not the same people who administer tests to different groups. I think Bayesian techniques are much better at taking those errors into account. By giving us an interval estimate of a value, we are able to see if this interval is outside the bounds of what we can expect from systematic errors.

    IMO the systematic errors are responsible for many of the scientific fads wee see in the health sciences. Diets and medications are adopted because they pass null hypothesis tests. The other day I was reading a study on the effect of Claritin or Allero (i don't remember which) allergy medication. At first glance when you look at the graph it seems like the pills are effective. Looking closely at the study, you realize that the graph is all zoomed in and that the effect is only compared to that of a placebo. When you look at it closely you see that the medication is less than twice as effective as the placebo and only reduces symptoms by one or two point on a scale which I think goes up to 28 (This was never stated in the paper, but If I understand correctly the scale was the sum of 7 sub symptom scales with 4 points each). So on average a person taking allergy medication would reduce their allergy symptoms by one point in only one of its 7 symptoms compared to half a point for the placebo. But they make billions out of these pills and, hey, they pass the null hypothesis tests with great p-values!

  2. I've never liked the phrase "frequentist procedure". For a given problem, how does one find a frequentist procedure? Is a CI procedure frequentist based on its sampling properties? I didn't read the paper but it seems that by Larry's criteria maximum likelihood is no more frequentist than Bayes (outside of the normal model and other simple problems). For example, how do we obtain a 95% frequentist confidence interval for a logistic regression parameter with a continuous predictor?

    95% CI based on score equations:

    Correct frequentist coverage? No

    Correct Bayesian coverage? No

    Correct asymptotic frequentist coverage? Yes

    95% CI based on a posterior distribution:

    Correct frequentist coverage? No

    Correct Bayesian coverage? Yes

    Correct asymptotic frequentist coverage? Yes

  3. I don't know of a good reason to adopt confidence coverage as my criterion for statistical inference, but I know several reasons to adopt Bayesian probability. In fact, confidence coverage alone can't be the only principle — there are many examples of wacky pathological confidence intervals which no one would actually use. This implies that there must be some other principle that frequentists use to rule them out. Does that principle have a formal statement, or is it just the application of common sense? Until that principle is spelled out, I'd have a hard time placing confidence coverage above Bayesian methods, although they are comparably useful.

  4. Not to belabor this but:

    1. Yes the confidence intervals depend on the model.

    But I am asking whether coverage is or is not a desirable goal. I think it is. I suspect you agree with this. Not all bayesians do.

    2. Great we agree.

    3. Not sure what this means.

    4. Yes I was assuming iid (or exchangeability).

    That's much weaker (in my opinion) than assuming

    exchangeability PLUS adding a subjective prior.

    But I agree if you make the prior assumption clear

    that's good. As I said, if I did have a prior I trusted I'd be happy to use it too.

    Overall we agree on many points. But I think I put

    a much higher premium on frequentist guarantees than you do. (I think).

    Ciao

    –Larry

  5. "Making no distributional assumption" is itself a distributional assumption: it states that there is much more randomness than there really is in the world.

  6. I was a bit disappointed there were so few posts on this topic earlier.

    For Larry’s comment “The Bayesian answer hides the problem” I would suggest rather “The Bayesian answer as presented in most publications hides the problem” and perhaps even is intentionally hid from scientists by enthusiastic statistical consultants.

    I raised this issue in an old letter to the editor entitled “Two Cheers for Bayes” to which Joe Kadane agreed the tentativeness of priors should not be hid, but responded very thoughtfully with a comment that both the likelihood and prior (as well as the utility or loss) must be argued for and this is a matter of rhetoric (qualitative arguments), of explaining why the reader (community of scientists) might be interested in the analysis.

    Some possible why’s were given here

    “Confidence” in the authority of the statistical consultant (i.e. Rubin or Wasserman)

    Fulfilling the objective of a repeated sampling performance criterion under a model that can’t be true (there is always some systematic error)

    Perceptive appreciation of the modeling risks in both the likelihood and prior – by the statistical consultants and the scientists

    (Historically, apparently Laplace was most concerned about the likelihood, Fisher the prior?)

    My guess is many published accounts of statistical analyses using Bayesian methods are not very specific about the roles of the priors and whether the resulting intervals are essentially “rhetorically credible” posterior intervals or just rather approximate confidence intervals (obtained by a not so terrible but unsubstantiated choice of a default prior).

    Perhaps we need a better sense of what statistical consultants should inform scientists about and ensure gets into publications…

    I like Larry’s choice of the word trust in “we (Scientists) can trust” as it’s the most that can be achieved in Science ( as CS Peirce would say, “momentarily set aside doubts”)

    Cheers

    Keith

Comments are closed.