Skip to content

Example of a conditional probability calculation

Bill Harris has a fun little calculation of a conditional probability using three different data sources. Could be a good example for teaching intro probability or basic Bayesian inference.


  1. Aleks says:

    I was quite impressed with Eliezer Yudkowsky's mini-course at

  2. Andrew says:

    I'd prefer if Yudkowsky's page were called "Bayesian inference for binary inference." It's fine, but it doesn't capture most of what I see as applied Bayesian inference; see here.

  3. Keith O'Rourke says:

    Posted something about this earlier re: Validation of Software by Cook et al.

    But if you “encode” the joint distribution from Yudkowsky's first example as a “data set”

    “R” code

    > datajoint

  4. Bill says:

    Could you expand on that "Bayes theorem (a.k.a. Nearest Neighbors) " comment? That's a spin on Bayes theorem I've not seen before.


  5. Keith O'Rourke says:

    It’s just a rose by another name does not smell as sweet comment. Some would argue that Bayesian analysis simply involves a joint distribution of parameters θ (the unknowns) and observations x (random variables drawn from probability distribution with given unknown values of θ). With the joint distribution of (x, θ), a basic “axiom of inference” then says that probability statements about θ should be based on the conditional distribution of θ given the data observed xO, otherwise known as the posterior distribution of θ and here denoted by the posterior distribution π(θ | xO). This is a particular application of conditional probability in a two-stage system where we observe the outcome from the second stage (the data xO) and want to make statements about the concealed outcome from the first stage (the unknown parameters θ). This application is commonly referred to as Bayes theorem. This application of conditional probability could be viewed as the key step in what distinguishes a Bayesian analysis. For instance see Optimality and computations for relative surprise inferences. M. Evans, I.Guttman and T. Swartz, Canadian Journal of Statistics, Vol. 34, No. 1, 2006, 0 113-129. Now if the joint probability is specified exactly or approximately as a data set this conditioning step is the same as doing Nearest Neighbors. Now will this renaming make Bayes Theorem less mysterious for some? Would it be worthwhile to introduce students to Bayesian statistics with some minimalistic examples where joint distributions could be coded as data sets? (Now as Andrew pointed this is only one step of an applied Bayesian analysis) Keith