## Clearing up some misconceptions about Bayesian statistics

I was checking out the comments at my bloggingheads conversation with Eliezer Yudkowsky, and I noticed the following, from commenter bbbeard:

My sense is that there is a fundamental sickness at the heart of Bayesianism. Bayes’ theorem is an uncontroversial proposition in both frequentist and Bayesian camps, since it can be formulated precisely in terms of event ensembles. However, the fundamental belief of the Bayesian interpretation, that all probabilities are subjective, is problematic — for its lack of rigor. . . .

One of the features of frequentist statistics is the ease of testability. Consider a binomial variable, like the flip of a fair coin. I can calculate that the probability of getting seven heads in ten flips is 11.71875%. I can check this, first of all, with a computer program that generates random numbers uniformly in [0,1) in groups of ten, and keeping tabs on what fraction of samples have exactly seven numbers less than 0.5. Obviously I can do this for any (m,n). I can also take a coin and flip it many times and get an empirical approximation to 11.71875%. At some point a departure from the predicted value may appear, and frequentist statistics give objective confidence intervals that can precisely quantify the degree to which the coin departs from fairness. . . . What is unclear to me is how a Bayesian would map out an experiment, either numerical or empirical, to demonstrate the posterior distribution in the unknown unfair coin experiment. That’s why I ask, “what does the posterior distribution mean”? . . . The Bayesian interpretation is certainly not what we use in physics. Suppose we lived at a time before the speed of light was measured accurately. You could poll a bunch of people, even “experts”, and get a range of guesses about the value of the speed of light. A Bayesian would construct a prior from this information. But what happens when you go do the experiment? . . .

I don’t know that any readers of this blog will need an answer to these questions, but just quickly:

1. No, Bayesian probabilities don’t have to be subjective. See chapter 1 of Bayesian Data Analysis for discussion and examples.

2. Bayesian models can indeed be tested. See chapter 6 of Bayesian Data Analysis.

3. Probability distributions in physics are not so clear as you might think. See the bottom half of page 7 in my Bayesian Analysis discussion here.

OK, I think that just about covers it.

P.S. These definitions (from pages 1-2 of this article) may also be of help:

“Bayesian inference” represents statistical estimation as the conditional distribution of parameters and unobserved data, given observed data. “Bayesian statisticians” are those who would apply Bayesian methods to all problems. (Everyone would apply Bayesian inference in situations where prior distributions have a physical basis or a plausible scientific model, as in genetics.) “Anti-Bayesians” are those who avoid Bayesian methods themselves and object to their use by others.

1. The people in the physical sciences that I know talk in Bayesian terms all the time, as in, "the probability that the Hubble constant is between 67 and 82 km/sec/mpc is 95%". It may be that they do their calculations in a frequentist way, but their language betrays them in that they interpret a confidence interval as a Bayesian credible interval.

2. mike says:

We often (rather cynically) ask those in the physical sciences to define the p-value. When they insist that it's the probability of the null hypothesis, we insist that they have bayesian tendencies and should embrace them.

3. Chuck says:

Some physicists are Bayesians.

The Feynman Lectures on Physics contain a number of bayesian-like statements on probability.

For example, Vol I. P6-2.

Probabilities need not, however, be "absolute" numbers. Since they depend on our ignorance, they may become different if our knowledge changes.

Chuck

Perhaps some of the confusion comes about because in the subjectivist variety of Bayesian statistics (as practiced, for example, by De Finetti, and, perhaps a bit less strongly, by Lindley), all probabilities are subjective in the sense that the subjective view of probability is the most general view.

However, regarding all probabilities as subjective – in the sense that they represent the beliefs of a particular person – doesn't rule out the possibility that some of these "subjective" probabilities are based on so much empirical data that any other reasonable person, who is aware of the same data, would have an almost identical subjective probability. Such probabilities are really no more subjective that the probabilities used by frequentists.

Perhaps another source of confusion is that the "objective" probabilities used in real applications by frequentists are also subjective, in the sense that they are the result of assessments of data by actual people, who are of course fallible. Frequentists just don't admit that these probabilities are subjective. Or more precisely, sensible frequentists realize that they are subjective, but they choose not to include that in their mathematical formalism, but instead use informal means to make any adjustments to their conclusions that are required by the lack of certainty in these "objective" probabilities. (For example, a sensible frequentist who gets a p-value of 0.00000001 will probably not behave any differently than if they got a p-value of 0.00001, since the difference is likely meaningless given that the model is probably at least slightly flawed). Of course, actual Bayesians have to make some adjustments informally as well, since formalizing prior beliefs is an inexact art.

5. Phil says:

I think it's true that many physics experiments don't need, and wouldn't benefit substantially from, Bayesian statistics. If you're trying to measure the charge on an electron, or the speed of light, or the mass of a neutrino, or whatever, you haven't really contributed much if your result is (or would be) substantially influenced by your prior distribution.

This is true even in many cases of experiments that are subject to a lot of uncertainty. For instance, I've been doing some work recently with measurements of viscosities of gas mixtures. Any given experimental group gets fairly replicable experiments, differing perhaps by 0.1%…but different experimental groups get values that differ by 0.5% or more because of systematic errors. (For instance, each group uses a different experimental apparatus, thus requiring a different equation for converting the quantities that are actually measured, such as the rate at which the oscillations of a rotating cylinder are damped, into a viscosity; the equations are themselves subject to uncertainty). Suppose I design, build, and use a new apparatus, perhaps using a new design for which I believe I have an accurate method of converting an observed quantity into a viscosity measurement. I use this apparatus to measure the viscosity of a well-studied mixture (like O2-H2)…and my value is 1.5% lower than the median measurement in the literature, and 0.5% lower than any other measurement in the literature. What should I do here? It's really not even a statistical question: if I think that all of the other published values are systematically wrong, I should simply ignore them. It's hard to see what Bayesian analysis has to offer here. For that matter, it's hard to see what frequentist analysis has to offer, either!

6. Andrew Gelman says:

Phil: In my experience, physicists start needing statistics when they start fitting curves, especially functions such as A*exp(-ax) + B*exp(-bx) that are hard to fit from data.

7. Echoing Radford: The speed of light example in the comment may actually vindicate the Bayesian viewpoint, at least when thinking about information on unknown but fixed quantities.

Considering that if the prior distribution elicited from experts is diffuse, that indicates a lack of previous (experimental) agreement, even if the prior is essentially a collection of wild guesses (the length of the Emperor's nose example).

When running the experiment, the inherent precision of the apparatus and the result will then tell you something relative to the prior information, whether you have virtually zero new information in the posterior (suggesting that the prior contains sufficient information on the truth), relatively positive information (the new apparatus adds something) or even negative information (considerable disagreement between the current apparatus and previous experts, or a real chance for some new science.)

8. Speed of light is a bad example, because in the SI system of units, the speed of light is a defined constant equal to 299792458 meters/second. The second is defined by atomic clocks. The meter is a derived quantity. This definition has been in effect since 1983. It is a huge advantage to do it this way, because the old meter was defined, first in terms of an artifact in Paris (go to Paris every time you need to know how long something is?), then in terms of a difficult-to-measure wavelength available only to a few laboratories (but at least you only had to spend the money, not go to Paris). Accurate clocks now make it easy to reproduce the second to the required precision, so anyone with a suitable clock can define a meter to very high precision.

Only the kilogram still depends on an artifact. They are working on this.

Thus, the cop on the highway can bust you for speeding with great precision!

9. Anne says:

There's a great quote by Ernest Rutherford: "If your experiment needs statistics, you should have done a better experiment."

Unfortunately in astrophysics that is often infeasible, since we are stuck with looking at what's in the sky. (In the most frustrating example I know of, there is some debate about the distribution of the low-order multipole moments in the cosmic microwave background, but we only have – in fact can only ever have – one sample.) So we end up using statistics in some form or other constantly. All too often we're hopelessly uninformed about the tools of modern statistics; I'm hoping to effect some improvement in my own knowledge at least by being a reader here.

I would say, though, that in many situations we have a preference for methods that are (perceived as) as model-independent as possible. Thus, for example, rather than try to come up with a model detailed enough to describe pulsar timing noise, we will fit a simple polynomial or spline through pulse arrival-time measurements. The idea, at least when I do it, is that the process we're trying to describe is far too complicated for the data to adequately constrain a physical model. All I'm hoping to do is, effectively, summarize the data in a way that can be used for further computation (extracting derivatives from the spline, say, to estimate torques involved). The most important issue is that I can clearly understand and explain the distortions I introduce.

10. Keith O'Rourke says:

Anne: The roughly equivalent quote from Fisher (an anti-Bayesian) was roughly – the really important goal of experimental design is to lesson dependence on assumptions while best ensuring those that are needed are least wrong (i.e. randomize if you need to assume comparable groups). But we perhaps should distinguish between interesting assumptions that can be refuted and nuisance assumptions (i.e. yes have strong assumptions to test like gender differences in treatment response but stratify the randomization on gender)

But more generally, perhaps as implicit in Radford’s post we need more critical scholarship on Bayesanism or better still Epistomologie Practique of applying statistics.

Some quick asides for a blog post – “Frequentists just don't admit that these probabilities are subjective” Keynes very definitely did in the early 1900’s (the probability of data model being correct usually assumed to be one) though this was missed and almost never quoted (I am sure Fisher would have read it but there is likely no evidence).

“non-Bayesian statistics is focused on developing methods that give
reasonable answers using minimal assumptions” – this is consistent with Laplace becoming an anti-Bayesian later in his career (according Anders Hald) when he rejected the use of joint distributions (Bayes) given his concerns about the data model (likelihood) being too hard to get not too wrong. This messes up my favourite definition of non-Bayesian approaches as simply being to try to get by as best as possible without a formal prior – maybe Laplace became a non-Likelihoodlum.

Given the current widespread popularity and coolness of being a Bayesian, the anti-Bayesian label may unduly distract from the needed critical scholarship and an alternative label should be found (or maybe there just aren’t many if any anti-Bayesians left)

Keith

11. I've further replied to yon commenter at Frequentist statics are frequently subjective.