Skip to content
 

Worst-case inference vs Bayesian inference

We often like to say 50:50 when in doubt. Why this tendency? There are two different ways of arriving to such a conclusion: worst-case analysis and Bayesian inference.

1. A worst-case decision theorist (often also found in machine learning, statistical learning or computational learning circles) would assume some sort of a loss function L(pprediction,ptruth). Examples of loss functions would be Brier score (similar to L2), or Kullback-Leibler divergence. We would then try to find a value for pprediction that is least far to any ptruth. Essentially, we try to bound the loss.

2. A Bayesian would assume a uniform (or at least symmetric) prior on the truth P(ptruth). With no data, the posterior is the same as prior. We then try to summarize the prior with the expected value for ptruth: the expected value is essentially the ‘center’, a single value that replaces the variety of possibilities.

It happens that both of these approaches result in the same 50:50. But the derivations are very different. As Bayesians we summarize, as decision theorists we optimize. As Bayesians we assume the probabilities for various events, but remain agnostic about the loss: decision theory is easy once you have true probabilities. As decision theorists we assume the costs, but remain agnostic about the truth: we start as a blank slate.

One Comment

  1. Andrew says:

    Aleks,

    L. J. Savage's famous book develops a minimax theory of Bayes. I don't think it makes much sense, though (for the same reason that minimax never makes sense in decision theory, that you wouldn't want your decision determined by an event of arbitrarily low probability).