Everything begins with “p”

John Cook puts it well:

There’s only one symbol in statistics, “p”. The same variable represents everything. You just get used to it and figure out which p is which from context. It reminds me of George Forman naming all five of his sons George. Here’s an example I [Cook] ran across recently where p represents four different functions in one equation:

p(θ | x) = p(x | θ) p(θ) / p(x)

Usually this is done with no explanation, but in the example above the author explains that he’s denoting entirely different functions with the same symbol in order to avoid the “clumsy notation” that being explicit would require.

Sometimes the overloading of the 16th letter of the English alphabet becomes just too much and statisticians break down and use the Greek counterpart, π (pi). So then to make matters even more confusing to the uninitiated, π can be a variable or a function.

He’s right, and I say this as someone who’s done my part to spread this notation. We talk in Bayesian Data Analysis about how to use this notation and why it’s more rigorous than it might seem as first. I really really don’t like the notation where people use f for sampling distributions, pi for priors, and L for likelihoods. To me, that really misses the point. The notation shouldn’t depend on the order in which the distributions are specified. They’re all probability distributions, that’s the point.

4 thoughts on “Everything begins with “p”

  1. But we have different *words* for priors, likelihoods, and sampling distributions, and we frequently find it convenient to differentiate them (otherwise, different words would be unnecessary; they'd all be "distributions", as you say). Since notation is simply formalized language, what it the harm in differentiating them in notation too? As someone who has gotten used to the notation, it doesn't matter much, but it seems too strong to say that notation shouldn't depend on the role of the distribution, when the language you used clearly does.

  2. Stats notation, like most language, is highly context-dependent. You typically need to know the context to interpret the four different functions called p in p(a|b)p(b)=p(b|a)p(a).

    As a consequence, formulas can't stand on their own. Typically there are parameters simply elided; that is, p(b) is really shorthand for p(b|theta) where theta is some implicit model parameter.

    Vector notation is typically underspecified, with the same symbol (x or theta) being used for scalars and vectors (and sometimes whole matrices).

    Expectation notation gets confusing because it almost never indicates which variables are being integrated over which distributions.

    The problem is much more acute for novices. Once you know what you're doing, it's pretty obvious which variables are of which type and which distribution p refers to. They're the ones that make the formulas work out to be true.

    I find it much easier to undertand models presented in something like BUGS format, because it makes all the notation really explicit.

  3. I agree with you that all of these quantities are probability distributions, and find this unifying way of thinking of them helpful. But not all seem to agree. I've recently run across a new book in Bayesian reliability where the author insists on referring to P(T>t|lambda) as a chance instead of a probability, reserving probability for the marginal probability, P(T>t). I've also recently read an essay by a contemporary philosopher who insists that the likelihood is somehow different from a probability, yet uses Pr(O|H) for the likelihood of observation, O, given hypothesis, H. This is part of the reason I can't seem to extract much of value from modern philosophical writing.

Comments are closed.