Mike Grosskopf writes:

I came across your blog the other day and noticed your paper about “The Boxer, the Wrestler, and the Coin Flip” . . . I do not understand the objection to the robust Bayesian inference for conditioning on X=Y in the problem as you describe in the paper. The paper talks about how using Robust Bayes when conditioning on X=Y “degrades our inference about the coin flip” and “has led us to the claim that we can say nothing at all about the coin ﬂip”. Does that have to be the case however, because while conditioning on X=Y does mean that p({X=1}|{X=Y}I) = p({Y=1}|{X=Y}I), I don’t see why it has to mean that both have the same π-distribution where Pr(Y = 1) = π.

Which type of inference is being done about Y in the problem?

If you are trying to make an inference on the results of the fight between the boxer and the wrestler that has already happened, in which your friend tells you that either the boxer won and he flipped heads with a coin or the boxer lost and he flipped tails, then it makes sense that the two situations would be given the same inferential status. The distribution of π’ defined as (Pr(Y=1|{X=Y},I) = Pr(X=1|{X=Y},I) ~ a delta function at 0.5. I am not sure why this is objectionable.

π’ ≠ p(π|{X=Y},I) however because p(π|{X=Y},I) = p(p(Y=1|I)|{X=Y},I), which is basically saying how does conditioning on X=Y effect the distribution of possible probabilities for the prior of Y.

If you are trying to better understand what chance the boxer had going into the fight after conditioning on the information that X=Y, then p(π|{X=Y}I) is the relevant inference instead of π’. This is where you would expect no change in uncertainty in π when conditioning on the coin flip. As I laid out in the first email (though in a particularly messy and illegible format), this inference is not changed by conditioning on the coin flip in Bayesian analysis. We are still just as uncertain about what the boxer’s chances were (π). I would think that Bayesian analysis gives the correct analysis in both cases.

A better way to clarify what I was thinking is considering where, instead of conditioning on the result of the coin flip (Y=X), condition on the result of something essentially certain, like the sun will rise tomorrow (Y=A). If someone presented you with the information that “the boxer won just as sure as the sun will rise tomorrow” you would give the same inferential status as certain (p()=1 for both) and the distribution of π” = Pr(Y=1|{Y=A},I) would be basically a delta function at 1. However, if you were doing inference on what the boxer’s chances were going into the fight, p(π|{Y=A},I), you would not be certain if the boxer was unlikely to win and pulled an upset or if the boxer was a heavy favorite and easily followed through. Your distribution for π would be updated to be p(π|{Y=A},I) = 2*π (from the first email). All you would really be certain of is that the boxer had some chance, and you would feel that it was now more likely that he had a good chance to win instead of being the underdog. This again seems to work out fine using Bayesian with an uninformative prior.

My reply: I’m too tired to think about this, but I’ll post it and then maybe others will have some thoughts. The one thing I can tell you is that it’s an old example–I came up with it in discussions with Augustine Kong back around 1988.

Sounds like you want a probability distribution of probability distributions. If you take a coin and flip it a million times it'll probably end up 50/50 heads vs. tails. But it's possible there's some minor manufacturing defect, such that heads is very slightly preferred and you end up with 51/49. Thus rather than stating we expect a normal curve centered at 50%, we expect a normal curve where the mean itself is drawn from a tight normal distribution with a small s.d.

If we set a boxer and wrestler fighting over and over again (tricky experiment to design since you phrased it as to the death) we may find that the boxer wins 10% of the time, or 50% or 99%. Thus we expect a normal curve where the mean is drawn from a uniform distribution from [0,1]. In both cases the most likely result remains a 50/50 split, but in the boxer case we're now acknowledging that repeated experiments are unlikely to actually result in half wins for the boxer, but rather they'll collapse to some other win distribution.

I just read your boxer/wrestler example, and it is excellent.

I disagree with this from the robust example: "has led us to the claim that we can say nothing at all about the coin ﬂip". I think of pi as a 3rd axis on your plots, and I want to marginalize along that direction if you want to calculate Posterior(heads).

I think this is also what mike grosskopf is trying to illustrate.

Thanks a bunch for posting this.

Aslak: That is pretty much what I was going for with that. That the difference between the fight and the flip is that we have little uncertainty over what the probability of heads is, but total uncertainly about the probability of a boxer victory. I tried to derive some points about it in the first email I sent him (referenced in the last paragraph), but it turned out really illegible.

As Paul said, the distribution for pi is really the probability of a probability. The uncertainty in the problem is because of the spread in pi. Knowing the results of the boxer-wrestler fight gives some information to help tighten that distribution for future fights, or conditioning the fight on the results of something that is not 50/50 (replacing flipping heads on the coin with rolling 4 or less on a 6-sided die in the B,W,C problem) would result in changing the pi distribution for the fight and giving less uncertainty. After writing the emails before and thinking about it, it dawned on me that I basically regurgitated Jaynes' Ap distribution.