What is probability?

This came up in a discussion a few years ago, where people were arguing about the meaning of probability: is it long-run frequency, is it subjective belief, is it betting odds, etc? I wrote:

Probability is a mathematical concept. I think Martha Smith’s analogy to points, lines, and arithmetic is a good one. Probabilities are probabilities to the extent that they follow the Kolmogorov axioms. (Let me set aside quantum probability for the moment.) The different definitions of probabilities (betting, long-run frequency, etc), can be usefully thought of as models rather than definitions. They are different examples of paradigmatic real-world scenarios in which the Kolmogorov axioms (thus, probability).

Probability is a mathematical concept. To define it based on any imperfect real-world counterpart (such as betting or long-run frequency) makes about as much sense as defining a line in Euclidean space as the edge of a perfectly straight piece of metal, or as the space occupied by a very thin thread that is pulled taut. Ultimately, a line is a line, and probabilities are mathematical objects that follow Kolmogorov’s laws. Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.

We discuss some of these different models in chapter 1 of BDA.

P.S. There’s been some discussion and I’d like to clarify my key point, why I wrote this post. My concern is that I’ve read lots of articles and books that claim to give the single correct foundation of probability, which might be uncertainty, betting, or relative frequency, or coherent decision making, or whatever. My point is that none of these frameworks is the foundation of probability; rather, probability is a mathematical concept which applies to various problems, including long-run frequencies, betting, uncertainty, decision making, statistical inference, etc. In practice, probability is not a perfect model for any of these scenarios: long-run frequencies are in practice not stationary, betting depends on your knowledge of the counterparty, uncertainty includes both known and unknown unknowns, decision making is open-ended, and statistical inference is conditional on assumptions that in practice will be false. That said, probability can be a useful tool for all these problems.

118 thoughts on “What is probability?

  1. Probability might be defined as the mathematical/linguistical dimension, while empiricism is defined as observation-inference and/or prediction-inference. Probability for me is the equation part but *not* [necessarily] the explicit observed factors we are speaking of.

  2. Of course. Do people disagree with you?

    When you fire a gun, the bullet leaves the barrel at a velocity at an angle relative to the ground. You can plot where it lands. Except the bullet traverses air, unless you’re firing in a vacuum and can isolate to gravitational effects. You fire based on what you think will happen, except of course there may be wind along the bullet’s path or the barrel may too hot or too cold – which really matters when you’re firing automatic weapons or artillery – and so on. You have beliefs about what will happen or you wouldn’t fire: even if you’re spraying rounds into the darkness, you’re doing that for a reason, though that may be you thought you heard something. You measure what happens and adjust your weapon, the amount of powder used, etc. A gps guided mortar shell uses fins controlled by a ‘computer’ to keep it aimed at where it was aimed (though people seem to believe that means you can fire it in any direction and it will adjust). You can see frequentist and Bayesian ideas and it’s all math and it’s all drawn out over geometries. As these project to planes that …

  3. Speaking as a frequentist, I fully agree that anything fulfilling the Kolmogorov axioms is a probability, and that both Bayesian and frequentist probabilities (typically) meet this standard. We may disagree about which interpretation of probability is relevant to particular probability statements that we make in the course of statistical inference, but that doesn’t make either interpretation invalid in some general sense.

    That said, I do think it’s useful to be clear which interpretation we intend by our probability statements. For example, Bayesians will sometimes describe p-values as probabilities conditional on the null parameter value. Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter, and p-values are probabilities computed using the null distribution. Conditioning on parameter values makes no sense when parameters are not random variables.

    This type of distinction may or may not lead to much confusion in practice, but a statement about p-values construed as optimal betting terms after learning that the null parameter obtains is very different from a statement about p-values construed as limiting relative frequencies in a hypothetical world in which the null parameter value obtains. The purist in me generally wants to be clear which of these statements (or which other statement, or both statements) is being made, and such loose talk tends to muddy the waters.

    • +1

      There was another post re pvalues and conditioning where Andrew seemed to take almost the opposite view, arguing a pvalue is a conditional probability on informal grounds. That doesn’t seem consistent with the view in this post.

      Tho it is perhaps consistent with a desire for alternative axiomatic treatments of probability eg those taking conditional probability as basic…

    • I think you make -en passant- an important distinction here: Kolmogorovs Axioms define probability mathematically, but the frequentist or bayesian _interpretations_ try to connect them to real world observations. Gillies in his 2000 book summarises this neatly: “The theory of probability has a mathematical aspect and a foundational or philosophical aspect. There is remarkable contrast between the two. While an almost complete consensus and agreement exists about the mathematics, there is a wide divergence of opinions about the philosophy.”

    • Ran said, “Bayesians will sometimes describe p-values as probabilities conditional on the null parameter value. Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter, and p-values are probabilities computed using the null distribution. Conditioning on parameter values makes no sense when parameters are not random variables.”

      The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).

      • Martha:

        I agree, but there has been some dispute about this point. There are people who define p(a|b) = p(a,b)/p(b) so that this conditional probability is only defined if b is a random variable. They would say that a p-value is _not_ a conditional probability because you can’t say p(y|H) unless you’re willing to put a probability on H, which they don’t want to do. So they write p(y;H). I, however, am happy to write p(y|H) even if I don’t give a probability on H, because, to me, p(y|H) is defined as the probability of y given H, and then I define p(y,H) = p(y|H)*p(H). That is, to me, the conditional probability is the fundamental or atomic concept, and the joint distribution is the derived quantity.

        • Where does that leave

          > Probabilities are probabilities to the extent that they follow the Kolmogorov axioms

          then? You seem to be choosing an alternative axiomatic treatment (which exist!) on the basis of an intuitive notion, rather than sticking to the Kolmogorov axioms. Or not?

        • The problem then is that conditional probability is undefined purely based on those. Within the Kolmogorov approach it then needs to be defined in terms of those axioms and primitives, giving the ratio form.

          Other axiomatic treatments can derive the ratio form *by including conditional probability in the axioms and primitives*. I think de Finetti and Popper etc do this.

          But in order to reject the idea that (elementary) conditional probability is *defined* by the ratio form I think you need to work from an alternative axiomatic system where this concept is defined.

          So I do think people are prioritising intuitions over eg conditional probability over axiomatics, somewhat contrary to the theme of the post.

          Or not? See also below – how do you define conditional probability within the Kolmogorov system?

        • I think “Kolmogorov axioms” is really a stand-in in this discussion for “some acceptable axiomatic treatment”. Everyone knows Kolmogorov axiomized probability, but few people return to the axioms on a regular basis, it’s more like they work with the calculations they’re used to satisfied that it has a formal structure they don’t quite exactly remember. This is like set theory. I have a Bachelor’s degree in Mathematics, and just about every u-grad math text has some stupid unsatisfying chapter on set theory and notation in the beginning, but I never actually read a book on formal set theory until last year (thanks to a certain trouble maker ;-))

          As Martha said, cringing at defining p(a|b) = p(a,b)/p(b) is something some professional mathematicians do. It’s possible to axiomatize the system where p(a|b) is primary, and then define p(a,b) = p(a|b)p(b) = p(b|a)p(a). and this has good properties, for example, there’s no division by zero when p(b) or p(a) = 0

        • Do you mean that you prefer to derive this formula as a result, that you prefer a different definition, or that you accept this definition but you dislike it for some reason?

        • I prefer the definition “The probability of A conditional on B is the probability of event A given that that event B occurs”, because this captures the idea that I think is intended by “conditional probability”. Then, if P(B) isn’t 0, one can derived the formula p(A|B) = p(A,B)/p(B)

        • This seems to require a formal definition of ‘given’…can you elaborate on your preference in terms of a mathematical definition?

        • “The probability of event A conditional on event B is the probability of event A given that event B occurs”

          I agree with ojm, what is the definition of “the probability of event A given that event B occurs” then?

          A some point a more substantial definition is needed which is not just swapping synonyms like “conditional on”, “given that”, “contingent upon”, “subject to”, etc.

          I wouldn’t say that the formula is “derived” from that “definition”, I would say that the formula is the definition.

          In the continuous case there is a problem conditioning on zero measure events, but it can be handled by taking limits adequately.

        • I think this is one situation where the Bayesian notion of conditioning on a “state of information” makes good sense. Then all probabilities are conditioned on something, and conditioning is primitive/axiomatic, p(A,B|K) = p(A|B,K)p(B|K)

          and we can say p(A|B,K) is the probability when the state of knowledge is {K} union {“B is True”}. Since the probabilities are describing plausibility of boolean propositions in Cox’s view, and union of sets of propositions is a well defined thing.

        • It’s not satisfactory though when dealing with long-run-frequency notions and abstract sequences of numerical outcomes (RNGs) because we can’t lean on the well defined notion of union of propositions.

        • Carlos and ojm,

          I tried twice last night to respond to ojm’s comment, “This seems to require a formal definition of ‘given’…can you elaborate on your preference in terms of a mathematical definition?”, but both attempts never got posted. So here’s another try:

          1) I am using “given” in the sense of Definition 2 of The American Heritage Dictionary, 1985, which reads, “Granted as a supposition.” (This is also the usage that is often used in mathematics to describe the hypothesis of a theorem or conjecture.)

          2) I’m not sure I can elaborate on my preference in terms of a mathematical definition – because I’m not sure what you are asking for. Perhaps if you rephrase your question, I can respond to it.

          3. I think my response (1) at least in part replies to Carlos’ question, “what is the definition of “the probability of event A given that event B occurs” then?” To elaborate: I don’t claim that there any isolated definition of “the probability of event A given B”, just as there is no single definition of “the probability of event A”. However, I can elaborate by saying that I would not talk about “the probability of event A given B” unless the system of “probabilities given A” satisfies the first three of Kolmogorov’s axioms. From these, one can derive the formula for condition probability, provide that (or, as I might say, given that) the system of probabilities given A does satisfy the three axioms.

        • So you mean you take P(A|B) to be short hand for something like:

          If B then P(A),

          where P(A) satisfies the Kolmogorov axioms and the above is a logical statement?

          Can I multiply the logical statement ‘if B then P(A)’ by a number?

          Can you then *always* ‘invert’ the conditioning to give P(B|A)? Is it possible for P(A|B) to make sense while P(B|A) doesn’t?

          Maybe it would be helpful to explicitly give the derivation you mention?

        • I don’t think you can derive the formula for conditional probability from Kolmogorov’s axioms alone. You need to give some additional meaning to the word “probability”, then you can derive the meaning of “conditional probability”.

          Given a pre-existent definition of “probability” you can “derive” the axioms as well. Kolmogorov’s book has a section titled “The Empirical Deduction of the Axioms.”

        • Fortunately, I think Cox-Bayes gives a solution to all of this at least for the case where we’re discussing boolean statements. Accept that p(A|K) is an assignment of a probability based on a state of information which is a set of true propositions, and that p(A) is a shorthand notation for this where K is implicit.

          Accept that p(A|B) is shorthand for p(A|Union(B,K))

          From Cox’s axiom about conjunction we have p(A,B) = p(A|B)p(B) essentially axiomatically.

          all the stuff about adding up to 1 and negation and soforth comes out of the other Cox axioms…

          Now to handle frequentism we restrict the whole thing to sequences of numbers, and K to knowledge of the properties of the sequences.

          p(SomeEvent | SomeEvent is a logical statement about certain numbers subsetted in a known way from an infinite sequence of real numbers that passes Per Martin-Lof’s most powerful test for randomness)

          and you have Kolmogorov’s probability theory as a special case of Cox’s theorems *waves hands wildly* ;-)

        • There are certainly axiomatisations of probability where conditional probability is taken as basic and the Kolmogorov definition is a theorem – see also de Finetti and Popper, I think.

          But like Carlos, I don’t see how you take take the Kolmogorov axioms and get conditional probability without *defining* it eg via the ratio form.

          Which I think comes back to the point about axiomatics – if people have different intuitions about which are the basic concepts then they may prefer different axiom systems.

        • Daniel, the discussion here is (I think) about how conditional probabilities need to be defined in Kolmogorov’s theory of probability. Kolmogorov’s axioms really means Kolmogorov’s axioms here, not just any axiomatic definition of probability. Martha even sent a link to remove any ambiguity.

        • Carlos, OJM, yes I agree this particular subthread (as opposed to say the broader discussion in this post, like Andrew’s original statement about axioms) really is about Kolmogorov rather than axiom systems more generally, so my most recent post was intended to as OJM mentioned, show a preference for an alternative axiom system that I think can be considered more general than Kolmogorov in the sense that you can get Kolmogorov’s system when you limit it to certain kinds of propositions. But, as with so much on blogs, I don’t deny that my treatment is handwavy and could use some careful thought. Perhaps I’ll write my own blog post about it.

          I agree that conditioning seems to need more axioms than the three Kolmogorov ones Martha linked to at Wiki: https://en.wikipedia.org/wiki/Probability_axioms

          In a system where we take conditioning as axiomatic and based on a set of propositions, then we can use axioms of set theory to handle unions of additional propositions that we “condition on”. I think this could put Martha’s statements which OJM interpreted as “if B then p(A)” onto a strong footing.

          All that is to say that perhaps to Andrew and Martha and others (myself perhaps) the Kolmogorov axioms aren’t the most intuitive and shouldn’t be taken as primary. Both Martha and Andrew have expressed a preference for conditioning to have a primary role. I don’t think either of them are logicians / set theorists / etc but particularly Andrew, so I think their informal intuitions are at least an interesting piece of information to inform us as to how successful axiom systems are for axiomatizing intuitive ideas

        • In particular, Andrew’s comment from http://statmodeling.stat.columbia.edu/2018/12/26/what-is-probability/#comment-936894

          seems to point to Andrew actually preferring a different axiom system than Kolmogorov (though he may not have enough formal logic background to be able to choose a particular one for example), which melds with my theory that in fact Kolmogorov’s axioms aren’t well suited to the use that Bayesians put probability to.

          I should probably go get DeFinetti’s book, but balk a bit at spending $75 for the Kindle edition and I don’t have space to put the hardcover :-(

        • OK, I’ve had a chance to look at this more carefully. On January 1 at 6:52 pm, I wrote:

          “However, I can elaborate by saying that I would not talk about “the probability of event A given B” unless the system of “probabilities given A” satisfies the first three of Kolmogorov’s axioms. From these, one can derive the formula for condition probability, provide that (or, as I might say, given that) the system of probabilities given A does satisfy the three axioms.”

          I realize now that my last sentence in the quote is wrong. I think I was remembering something else related that I had worked through carefully a few years ago. So I’m going back to what started this discussion:

          At December 26, 2018 at 12:30 pm, Ran said

          “… Bayesians will sometimes describe p-values as probabilities conditional on the null parameter value. Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter, and p-values are probabilities computed using the null distribution. Conditioning on parameter values makes no sense when parameters are not random variables.”
          The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

          At December 30, 2018 at 10:50 pm, I replied: “The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

          In this statement, I was not using the phrase “conditional probability” in the sense that we have been discussing – but in a broader meaning than that: namely, a probability defined with some restriction (assumption) involved. To (attempt to) describe the situation a little more clearly: In calculating a p-value, one assumes a particular type of distribution and particular values of parameters for that type of distribution, and uses these in calculating the particular probability that is called the p-value. Thus the calculation depends on the type of distribution and the particular values of the parameters of that type of distribution. This what I meant by a “conditional probability” here: a probability calculated under certain specified conditions.

        • (Apologies if this appears twice — the first time I tried to post it, it didn’t seem to appear.)

          “… Bayesians will sometimes describe p-values as probabilities conditional on the null parameter value. Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter, and p-values are probabilities computed using the null distribution. Conditioning on parameter values makes no sense when parameters are not random variables.”
          The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”
          At December 30, 2018 at 10:50 pm, I replied: “The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

          In this statement, I was not using the phrase “conditional probability” in the sense that we have been discussing – but in a broader meaning than that: namely, a probability defined with some restriction (assumption) involved. To (attempt to) describe the situation a little more clearly: In calculating a p-value, one assumes a particular type of distribution and particular values of parameters for that type of distribution, and uses these in calculating the particular probability that is called the p-value. Thus the calculation depends on the type of distribution and the particular values of the parameters of that type of distribution. This what I meant by a “conditional probability” here: a probability calculated under certain specified conditions.

        • For the record, in his axiomatic formulation de Finetti defines conditional probability in the same way as Kolmogorov:

          “Conditional probabilities P(E|H), or conditional previsions, P(X|H), are expressible, in cases where H has nonzero probability, in terms of the unconditional probabilities by means of a formula which, in an abstract, axiomatic treatment, can be taken as a definition: P(E|H)=P(EH)/P(H), P(X|H)=P(XH)/P(H).”

          To ensure that when P(H)=0 the conditional probability (undefined) is coherent he adds a third axiom:

          “Axiom 3 The conditions of coherence (Axioms 1 and 2) must be satisfied, also, by the P_H conditional on a possible H, where P_H(E)=P(E|H), P_H(E|A)=P(E|AH) is to be understood.”

          to the first two axioms:

          “Axiom 1 Non‐negativity: if we certainly have X ⩾ 0, we must have P(X) ⩾ 0.

          Axiom 2 Additivity (finite): P(X+Y)=P(X)+P(Y).”

          In de Finetti’s (non-axiomatic) theory, there is a theorem stating that

          “A necessary and sufficient condition for coherence in the evaluation of P(X|H), P(H) and P(HX), is compliance with the relation P(HX)=P(H)*P(X|H), in addition to the inequalities inf(X|H) ≤ P(X|H) ≤ sup(X|H), and 0 ≤ P(H) ≤ 1; in the case of an event, X = E, P(HE)= P(H)*P(E|H), is called the theorem of compound probabilities, and the inequality for P(X|H) reduces to 0 ≤ P(E|H) ≤ 1 (being = 0, or = 1, in the case where EH, or E͂H, respectively, is impossible).

        • > Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter

          Just means there is a *function* theta -> P(A). The parameter space need not have any of the additional structure required to make it a probability space, ie subsets etc of the parameter space need not satisfy the Kolmogorov axioms.

          For example, in an ‘all models are wrong’ context there is no need for theta (models) to come with mutually exclusive, sum to one structure. Bayesians seem to like to call this sort of case ‘M-open’.

          To a frequentist, (or likelihoodist or other) a pvalue and related quantities are a *function* of the ‘hypothesis’, yes, evaluated at one or more values, but they reserve the word ‘conditioning’ for formal statistical conditioning.

          One reason to be pedantic about this is that the formal structure of an additive, normalised measure imply, in all interpretations as far as I can see, that only one model can be ‘true’.

          To me this structure makes reasonable sense over observables, but not over theoretical constructs. But sure a Bayesian can call it a ‘conditional’ probability, it just won’t necessarily make sense to anyone outside of the Bayesian realm.

        • ojm said>

          One reason to be pedantic about this is that the formal structure of an additive, normalised measure imply, in all interpretations as far as I can see, that only one model can be ‘true’.

          I don’t think it’s the additive normalized measure here but rather the model of probability as a measure over plausibility of the truth values of propositions. It’s not possible for the propositions X=2 and X=3 to both be true.

          This is why I started working on a view of probability as a measure over “accordance” where I mean by accordance something like “compatibility with what’s assumed for the model”

          It’s possible for “the length of my pencil is 14.0cm” and “the length of my pencil is 14.02cm” and “the length of my pencil is 13.98cm” to all have equally good accordance with the particular theory of measurement that I’m using (for example if I know that there is a near uniform spread of errors in manufacturing the ruler I’m using over this range of errors and I know how good my optical discernment of the marks on the ruler are and I have a certain view on what it means for a pencil to “have a length” such that it isn’t relevant to my measurement whether or not a speck of dust lands on the tip and sticks out a bit).

          Did you ever skim that document I sent you? needs a lot of work but at least starts to develop this notion.

        • At January 3, 2019 at 4:36 am, OJM said:

          “> Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter
          Just means there is a *function* theta -> P(A). The parameter space need not have any of the additional structure required to make it a probability space, ie subsets etc of the parameter space need not satisfy the Kolmogorov axioms.”

          This is more or less what I was trying to say by
          “The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

          In this statement, I was not using the phrase “conditional probability” in the “traditional” sense – but in a broader meaning than that: namely, a probability defined with some restriction (assumption) involved.

          To (attempt to) describe the situation a little more clearly: In calculating a p-value, one assumes a particular type of distribution and particular values of parameters for that type of distribution, and uses these in calculating the particular probability that is called the p-value. Thus the calculation depends on the type of distribution and the particular values of the parameters of that type of distribution. This what I meant by a “conditional probability” here: a probability calculated under certain specified conditions.”

          Having thought about it more, I am inclined to suggest terminology that is not quite what OJM said, but more like this: I use the word “conditional” in two meanings.

          One is the traditional probability usage of “conditional on an event A”, where the event is in the sigma-algebra on which the probability function P is defined.

          The other is to express that the probability function P itself is defined in terms of (i.e., a function of) other types of “objects”: In the case of calculating the p-value, one uses a posited probability model, so the p-value calculation depends on the type of model and the parameters of the model.
          The discussions we’ve had suggest that it’s a good idea to use a different notation for this – perhaps P(A||model, parameters) would help avoid confusion.

          (In calculating p-values, there is an added complication: the p-value would be defined in the above terminology as something like P(A|| normal model with mean mu-naught and standard deviation sigma) where sigma is unknown, so the p-value is estimated using an estimate of sigma for from the data.)

  4. Hard to argue with this.

    On a related note, suppose you carry out a Bayesian data analysis (that is, building a generative probability model, fitting it, checking it, continuously expanding it, and so on until you converge on a single fitted model), and obtain a posterior distribution (or a sample from it) for a parameter of interest. My question is what is *your* interpretation of this object?

    Presumably there is a sense in which two people could carry out this exercise, with only one of them “getting it right”, and the other “getting it wrong”. My question is what does “getting it right” mean to you? I’m not asking about particular inferential outputs, like point/interval estimates, but about the entire posterior. If two people come up with conflicting posteriors, how would you decide which (or both) haven’t got it right? Or *how* wrong they’ve each got it? Is it simply a matter of how well they’ve followed the steps correctly? How would you measure that?

    Hopefully it’s clear how this connects with the question in your title.

    • If two people come up with conflicting posteriors, how would you decide which (or both) haven’t got it right?

      If they are both aware that they have the same priors and the same information, then we know that at least one of them isn’t Bayesian, no? ;-).

      If it were me–i.e. someone who doesn’t do this–I’d go meta.

      I’d be interested in Andrew’s response.

  5. I disagree that probability is a mathematical concept. The use of mathematical concepts to represent phenomena does not make such phenomena mathematical, as it happens in Physics, for example. I think probability is measurement of the the degree of uncertainty associated to phenomena, to follow Kelvin: it is only by associating numbers with any scientific concept that the concept can be properly understood.

    • “The use of mathematical concepts to represent phenomena does not make such phenomena mathematical”

      I would agree with this — but it is consistent with saying probability is a mathematical concept, and with using probability to represent the degree of uncertainty associated to phenomena.

      Analogy: A mathematical equation can be used to model/describe/measure/represent the trajectory of a falling object. That does not make the falling object a mathematical phenomenon.

  6. > They are different examples of paradigmatic real-world scenarios in which the Kolmogorov axioms (thus, probability).

    I think you accidentally a verb.

    Probability is not just a mathematical concept. Fields of probabilities on a field of sets (subsets of the set E of elementary events) are a mathematical concepts. But the interesting part of the theory of probability is how the empirical definition of “events” and which one of all the possible probability fields is the correct one.

    As far as Komolgorov’s axioms are concerned the probability of any event can be any number between zero and one. That’s not a very useful definition.

  7. Probability might be defined as the mathematical/linguistical dimension, while empiricism is defined as observation-inference and/or prediction-inference. Probability for me is the equation part but *not* [necessarily] the explicit observed factors we are speaking of.

  8. “Probabilities are probabilities to the extent that they follow the Kolmogorov axioms.” This seems wrong. Why are Kolmogorov’s axioms correct? If your answer is because they rigorously summarize our prior concept of probability. Then, you don’t really believe the statement in quotes. Someone could find another set of axioms that would lead to all of the same theorems that Kolmogorov’s axioms lead to as far as we know. Some of the theorems in Kolmogorov’s system would now become axioms, and some of Kolmogorov’s axioms would be theorems, but there would be no differences between the two systems in terms of what valid inferences it permits as far as we can tell. The alternative set of axioms would be less intuitive, but all of the truths about probability that we derive from Kolmogorov, we would derivable from the alternative. You might respond that that is fine because the two systems are equivalent. But, I did not say the two systems were equivalent. I said that they were equivalent as far as we know. What we would need is a completeness proof for Kolmogorov’s axioms. Maybe someone will tell me that there is one (but I doubt it). If truth outstrips provability for Kolmogorov probability as it does for arithematic (See Godel), then probability is not reducible to the axioms. Since Godel, we cannot assume that our axioms systems fully embrace our mathematical concepts. This reductionist view of mathematical truth has been dead for around 90 years.

    • > This seems wrong. Why are Kolmogorov’s axioms correct?

      They’re not “correct” they’re just axioms. Why did he choose those axioms? Because they matched with some kind of intuitive concept that people have about fundamentally unpredictable repeatable events, but in the end, you have a mathematical structure: axioms, definitions, and theorems that follow from the axioms. The question isn’t “why are they correct?” but rather “are they consistent?” In other words can you prove a theorem about probabilities and also prove the opposite of the theorem as well?

      The answer comes from model theory: an axiom system is consistent if and only if you can exhibit a model of the axioms. The axioms are so simple that all you need is the exponential distribution on the positive reals to exhibit an object that defines a sample space and a probability measure. Doesn’t even require you to do tricky methods to calculate the normalization constant, as it’s easily directly integrable. That the exponential distribution satisfies the axioms required to define a probability measure is pretty trivial to prove. Therefore the axioms are consistent.

      However, it turns out that Cox’s axioms also lead to the conclusion that a particular method of quantifying information about the truth or falsity of a boolean statement *also* leads to a mathematical system with an equivalence to the Kolmogorov axioms. The existence of the exponential distribution as a model of the Kolmogorov axioms proves their consistency as axioms. The Cox’s theorem proves that a certain set of other axioms leads to equivalence to Kolmogorov axioms… therefore also consistency with the Cox’s axioms. So, the probability algebra can be used as the mathematical basis of *two different kinds of calculations* one about pure sequences of repeated “random” events, another about the plausibility of certain boolean statements under lack of complete knowledge… They both lead to the same math.

      • The view that axioms systems can be devoid of content is a dead end. We know this from the work of Tarski, Godel, Quine, et al. Godel’s incompleteness theorems were the end of Hilbert’s formalism. Tarski showed that truth cannot be defined in the object language, but only in a metalanguage. This result also applies to any attempt to define “analytic.” The programs that have attempted to reduce mathematical truth to analytical truth have failed. I know this is philosophy of mathematics and not statistics, but my point is those results hold outside of arithmetic as well. There is no longer any reason to suppose that an axiom systems just presents a formalism that can be applied in certain cases. (Also, any system can prove its own consistency. An inconsistent system can prove its own consistency because everything follows from a contradiction given the law of the excluded middle.) I will point out that you said “because they match with some kind of intuitive concept.” But, that view is inconsistent with the statement I took issue with namely “Probabilities are probabilities to the extent that they follow the Kolmogorov axioms.” This statement reflects formalism in some, at least, latent form, and formalism is false. Mathematical truths are not analytic, devoid of existential content or merely a matter of convention. Such was what Hilbert’s program tried to establish, and such was what Godel proved false.

        • I disagree entirely with your interpretation of godel. Ultimately what we learn from Godel is that meaning comes from how we use math not from the axioms. Kolmogorovs axioms define what a probability IS but they have ZERO content when we ask what probabilities MEAN. In fact there are multiple disctinct meanings that we can assign the numbers that come from probability calculations, and this is why there are arguments about Bayes vs Frequentist statistics. The numbers are the same and all follow the axioms but the mappings to real world concepts which occurs ENTIRELY outside the axioms leaves us with philosophical arguments. The axioms themselves define formal manipulations that are allowed. They are basically computing rules.

        • I am at a complete loss for what the distinction is between how and meaning are being used here. I understand that there can be a distinction between syntax and semantic content. However, as I have said, all attempts to reduce mathematical truth to to syntax (or to put it another way, to prove mathematical truth is analytic), have failed. Here is, I think, a relevant quote from Godel:

          “I come now to the second part of our problem, namely, the problem of giving a justification for our axioms and rules of inference, and as to this question it must be said that the situation is extremely unsatisfactory. Our formalism works perfectly well and is perfectly unobjectionable as long as we consider it as a mere game with symbols, but as soon as we come to attach a meaning to our symbols serious difficulties arise. . . . The result of our previous discussion is that our axioms, if interpreted as meaningful statements, necessarily presuppose a kind of Platonism, which cannot satisfy any critical mind and which does not even produce the conviction that they are consistent.”

          The point is that Godel at least thought his results were a problem for formalism. I am not even sure what you are disagreeing with. The statement “Probabilities are probabilities to the extent that they follow Kolmogorov” is a statement about what it means to be a probability. If that statement isn’t about meaning or semantic content, then I don’t understand it. Of course, Kolmogorov’s axioms define probability. The symbols have to be interpreted. Once they are, there can be questions about what probability is.

        • I’m on my phone so hard to get deep into this topic, but I think you are failing to distinguish between probabilities: numbers that obey certain rules, and the uses that probabilities are put to. Since probabilities, the numbers, obey rules that are necessary for *more than one* kind of use it’s impossible to discuss what probabilities “really mean” because it’s not unique.

        • The question is why should the numbers that obey certain rules be called “probabilities”. The “number P(A) is called the probability of the event A”, because Kolmogorov says so.

          Does that mean than “a set of numbers following Kolmogorov’s axioms” is a satisfactory answer to the question “what is probability”, without further justification?

        • Let’s substitute the word “gronslag” for “probability”. What is a gronslag?… It’s a set of numbers that has certain properties that kolmogorov defined.

          Now, why would I care? This is a separate question. If I tell you that Cox set up a system for doing calculations on how much credence we should give to certain Boolean statements and proved that his reasonable requirements require you to use gronslags then this is a reason to care about gronslags.

          If on the other hand I tell you that almost all of the possible sequences of binary digits of length 2^1000 would pass a certain test for a property called slagginess and that if you know that a sequence is slaggy then you can answer certain questions about subsequences using the mathematics of gronslags, then this is another separate reason to care about gronslags.

          There may be other reasons as well.

          It often turns out that a formal system has many uses, for example using the formal system know as the amd64 instruction set you can not only create blogging systems to discuss statistics, but also encode video pictures of European football championships… Which is the true meaning of the amd64 instruction set is left as an exercise for the reader.

        • > Let’s substitute the word “gronslag” for “probability”. What is a gronslag?… It’s a set of numbers that has certain properties that kolmogorov defined.

          I don’t disagree (I don’t understand if there is a disagreement between you and steve, by the way).

          But I think that when people “argue about the meaning of probability” the question is not “what is a gronslag” (the mathematical object defined by Kolmogorov to be the basis of his theory of probability), the question is “what is probability”. There is a relationship, but if the answer doesn’t ellaborate on that it’s at best incomplete.

          I also find misleading Andrew’s characterization of long-run frequency as an imperfect real-world counterpart of probability. Random variables and different notions of convergence and asymptotic behaviour are mathematical concepts and they are an essential part of Kolmogorov’s mathematical theory of probability.

        • I think the issue is some people assume “probability” in some kind of common usage, has some relatively specific common intuitive meaning, and that Kolmogorov should have gotten very close to that. But I disagree, I think probability had several historical meanings, some of which were more nebulous than others, and Kolmogorov built his axioms around trying to formalize one particular meaning: the behavior of idealized unpredictable repeatable games of chance.

          But Laplace’s notion of probability was always different: the ratio of the favorable cases to the total possible cases. This is perfectly useful for one-off events where you fail to know which of the logically possible cases is in fact occurring. So I think my disagreement with Steve is really about whether the question of “did Kolmogorov do a good job of creating formal probabilities that match the ‘real notion’ that everyone has in their head” is even a question to be considered. In my opinion intuition is at best an individualized guide, at worst an impediment to understanding other possibilities.

          In fact, we know there are *at least two* legitimate models of probability (ie. real world concepts that map 1-1 to the numbers) and which is more ‘mathematically’ legit than the other is not a question I find interesting, though I do find interesting the question of *which is more relevant to actual scientific inference* and have strong opinions about that ;-)

          It’s a little like non-euclidean geometry. We can create models of non-euclidean geometry using spherical or hyperbolic surfaces etc, and we wind up with different geometric properties, but the question “which is *real* geometry” is not well posed. Sure, most people have planar geometry intuitions, but also lots of people live on and navigate around the globe, and we could easily imagine people living on toroidal spaceships or working out the geometric properties of HMC trajectories in 22000 dimensions. They’re all legitimate.

        • I don’t think anybody here is disputing that there are different interpretations of probability. And I would say that’s the right answer to the question “What is probability?”. Saying only that probability is a normalized measure does not seem very satisfying. (A measure over what? What does it measure?)

          There are also different mathematical concepts of probability. The probability of events is conceptually different from the plausibility of statements. The fact that there is an isomorphism between them is of course very convenient: it makes the intuitive notions of probability “legitimate”, as you called them.

        • Just that the word probability is in use for a long time and has inspired quite a bit of useful science among other things doesn’t imply that there is any “true” and consistent answer to the question “What is probability?”

        • In fact, I’d argue the exact opposite, Hilbert’s program was to show that axiom systems themselves were enough to define meaning, Godel’s contribution was to show that that was a dead program, and that Axiom systems themselves couldn’t contain the meaning.

        • Correct, and meaning is the semantic content, a model that interprets the symbols of the formalism. To use an analogy, I can draw a map of my apartment that helps me find my keys. It is just a collection of lines and jottings on a piece of paper. I can reinterpret the a map with a new key to be a map of Central Park to find the Balto statue. I can keep reinterpreting the map. If it helps me find what I am looking for the interpretation is “correct”. Now, should I say that a map of my apartment is any correct interpretations of the map. Likewise, we start with a concept of probability, create axioms that correctly fit that concept. Those axioms can be given various interpretations. It is a perfectly sensible question to ask whether all of those interpretations are models of our concept of probability. Under the view that probabilities are probabilities to the extent they follow the Kolmogorov axioms, I don’t see how the question can be made sense of.

        • Cox’s theorem shows that a model of rational reasoning about true/false statements which obeys certain relatively intuitive rules requires the mathematics of probability for computation.

          On the other hand, Per Martin-Lof showed that we could define “random” sequences in terms of them passing a certain kind of extremely rigorous computational test for randomness based in essence on intuitive ideas about what truly random “coin flips” should act like (it’s a non-constructive proof, but practically speaking, it’d be something like the die-harder tests: https://webhome.phy.duke.edu/~rgb/General/dieharder.php ). Events defined by these random sequences would have number associated to them which obey the rules of probabilities.

          This shows that probabilities correspond to at least two concepts/meanings.

        • It shows that the axioms correspond to at least two sets of meanings, i.e., have at least two interpretations. But those interpretations, what are they? I would say they are real (in the sense of non-imaginary) things or concepts (whatever word you are most comfortable with). And, per Carlos’ question above, it makes sense to ask further questions about whether those concepts are probabilities or not. We might say we have several different conceptions of probabilities and we have other concepts that satisfy the axioms, but which we do not believe are probabilities. We had an intuitive concept that lead to the axioms, but not every interpretation of the axioms will fit the intuitive concept. Godel certainly thought those intuitive concepts were real (See his Gibbs lecture). I am not saying that is a consequence of his incompleteness theorems (he had arguments from his incompleteness theorems to that conclusion). All I am saying it is a perfectly reasonable view of mathematical truth and meaning.

        • The problem as I see it is you give primacy to some nebulous concept that I don’t know what it is precisely, which you’ve named with the English word probability.

          You are in essence asking if any of the perfectly good concepts I refer to are good enough to get access to the label probability, in other words arguing over the meaning of words. I just don’t care about arguments over who gets to use words, or how good is the match between carefully thought out concepts and some other unexplained concept. What I do care about is which of the well defined concepts I mentioned are we using when we do calculations, and which one is appropriate for the real world, and are there other well defined concepts we might map to probability as well.

          I don’t buy the idea that there is a “real” probability defined by intuition and historical primacy, and we need to check to see if kolmogorov axioms map correctly to that intuition. For one thing I doubt very very strongly that the intuition is unique.

        • Steve wrote,
          ““Probabilities are probabilities to the extent that they follow the Kolmogorov axioms.” This statement reflects formalism in some, at least, latent form, and formalism is false. Mathematical truths are not analytic, devoid of existential content or merely a matter of convention. Such was what Hilbert’s program tried to establish, and such was what Godel proved false.”

          Speaking as a mathematician, I’ve got some issues with this:

          1. The statement “Probabilities are probabilities to the extent that they follow the Kolmogorov axioms.” is in essence a *definition* of what is meant *mathematically* by “probabilities”. This definition is intended to describe properties of the informal/intuitive idea of “probabilities”. To the extent that it does, it is a useful definition for making mathematical deductions about probabilities. But, like any model, it can be applied poorly or misleadingly.

          2. The phrase “mathematical truth” is one I don’t think I’ve ever used or encountered in all my years of teaching and doing research in mathematics, except perhaps to refer to a theorem (i.e., a statement of the form “If A, B, C, … then X” that has been rigorously proved).

  9. The view of probability as an abstract mathematical concept is apt. The question I find of special interest concerns the role(s) of probability in qualifying inference (which generally differs from its role in modeling variable phenomena).

  10. Ok, but is p(aliens exist on Neptune that can rap battle) = .137 valid “probability” just because it satisfies mathematical axioms? I do not think so. I believe they have to correspond to the real word to be valid and restricted to be studied by science, similar to von Mises’ view, or at least to be a probability that interests me.

    Justin
    http://www.statisticool.com

    • “p(aliens exist on Neptune that can rap battle) = .137” in itself isn’t something that can satisfy the axioms of probability. The axioms of probability refer to a “system” of probabilities that are “coherent” in the sense of satisfying the axioms. So, for example, the two statements

      “p(aliens exist on Neptune that can rap battle) = .137″ and p(aliens exist on Neptune) = .001”

      are incompatible according to the axioms of probability, because the event “aliens exist on Neptune that can rap battle” is a sub-event of “aliens exist on Neptune”, so the larger event must (as a consequence of the axioms) have probability at least as large as the probability of the smaller event.

      • > “p(aliens exist on Neptune that can rap battle) = .137” in itself isn’t something that can satisfy the axioms of probability.

        p(aliens exist on Neptune that can rap battle) = 1.137 does violate the axioms of probability in itself.

        The implied complete set of elementary events is A = “aliens exist on Neptune that can battle rap” and notA = “aliens do not exist on Neptune that can battle rap”. Then the following is a valid field of probabilities:

        P(A) = 0.137
        P(notA) = 0.863

        The following is of course also a valid field of probabilities, for any x in [0 1]:

        P(A) = x
        P(notA) = 1-x

        The numbers don’t mean anything at all unless we go beyond the axioms.

  11. OK, I’ll take the bait: I think this is too easy. The interpretation of probability has repercussions for how we do statistics, and how we draw conclusions from data. Harold Jeffreys developed a comprehensive Bayesian framework for both estimation and testing, and toward the end of his book he states, in italics: “The essence of the present theory is that no probability, direct, prior, or posterior, is simply a frequency”(Jeffreys, 1961, p. 401). Had Jeffreys believed probabilities to be frequencies, his entire framework would have lacked the proper foundation. Similarly, had Newman & Pearson interpreted probabilities as degrees of belief, this would have stopped them from proposing their own framework. As a pragmatic user, of course, one may not particularly care about the philosophical foundation of the statistical framework one is using, much like someone who drives their car to work does not particularly care about the construction work that happened there 30 years earlier. Yet, ultimately, the construction work is what made the trip possible.

    • What’s interesting to me is that, in statistics (or probabilistic inference from data if you will), practice always seems to lead theory rather than vice versa. Actually I suspect this is true in most human endeavors :) In applied work we are often pushing the boundaries of our available methods, and only work out formal properties later…

      • “Actually I suspect this is true in most human endeavors :) In applied work we are often pushing the boundaries of our available methods, and only work out formal properties later…”

        +1

        But I”ll add that sometimes we find that the legitimacy of our original boundary pushing collapses like a house of cards.

    • Similarly, had Newman & Pearson interpreted probabilities as degrees of belief, this would have stopped them from proposing their own framework.

      I’m not sure about this. Here’s Neyman and Pearson in 1933:

      Yet if it is important to take into account probabilities a priori in drawing a final inference from the observations, the practical statistician is nevertheless forced to recognize that the values of φ_i can only rarely be ex- pressed in precise numerical form. It is therefore inevitable from the practical point of view that he should consider in what sense, if any, tests can be employed which are independent of probabilities a priori. Further, the statistical aspect of the problem will appeal to him*.

      *This aspect of the error problem is very evident in a number of fields where tests must be used in a routine manner, and errors of judgment lead to waste of energy or financial loss. Such is the case in sampling inspection problems in mass-production industry.

      My gloss of this: N-P were happy to think about probability in a subjective Bayesian way. They were motivated by 1. the difficulty of precisely specifying priors, and 2. application areas in which long-run frequencies matter.

      I’m no expert though.

      • I looked at http://statmodeling.stat.columbia.edu/2009/09/12/the_laws_of_con/, but I don’t understand the point. Why should p3 have any relationship to p4? The experiments are different. You are mixing probabilities from different experiments and assuming a relationship between the experiments that does not exist. It may be surprising that putting detectors at the slits changes which slits the photons go through, but it does. And, you can write down the differential equations that show this and use ordinary probability with these equations.

        • The probabilities in experiment 3 are simple: If and only if the photon hits the screen in the top half, then it went through slit 1. (I’m assuming the equations for photons are similar to those for electrons, which will probably turn out to be true, but the details haven’t been worked out.) You can see a picture of the photon trajectories reconstructed from weak measurements of velocity in Figure 3 of

          “Observing the Average Trajectories of Single Photons in a Two-Slit Interferometer”,
          Sacha Kocsis et al.,
          Science 332, 1170 (2011),
          DOI: 10.1126/science.1202218,
          http://science.sciencemag.org/content/332/6034/1170

          The probabilities in experiment 4 are different because it is a different experiment.

          Physics is like statistics. In statistics we have Bayesians and frequentists. In physics, we have physicists who say that cats can be both alive and dead and particles don’t exist.

          I recommend Jean Bricmont’s recent book: “Making Sense of Quantum Mechanics”, https://www.springer.com/us/book/9783319258874

      • “p(x|y) is the distribution of x if you observe y, p(y|x) is the distribution of y if you observe x, and so forth”: You have to be careful with the word “observe”. Changing the experiment is not the same as someone doing the experiment and then only giving you part of the data.

      • On http://statmodeling.stat.columbia.edu/2013/09/25/classical-probability-does-not-apply-to-quantum-systems-causal-inference-edition/ Robins et al. say, “Assuming with Einstein that faster than light (supraluminal) communication is not possible, one can view the Neyman theory of counterfactuals as falsified by experiment. I’m not sure what Robins et al. are assuming, but if they are assuming the world is local, then that is false, as the Bell inequality and experiment show. But, you can’t communicate faster than light.

        On http://statmodeling.stat.columbia.edu/2009/09/14/response_to_two/, it says, “we typically do treat the act of measurement as a direct application of conditional probability.” Some experiments change the things you are interested in measuring, some don’t. In physics, the theory tells us which are which. You can have the same problem when you analyze surveys: did the survey change what you are measuring?

    • I don’t think you need to invoke Bohmian mechanics. If you do the physics correctly, applying the Born rule to obtain probabilities when measurements are made, everyting works as expected.

      That applies to Schroedinger’s wave function formulation. In the phase space formulation the physics is somewhat different, because the state of the system is described by a “quasi-probability distribution” which can be negative in small regions (but the uncertainty principle ensures that integrating we get positive probabilities). But I don’t think Andrew refers to that.

      • Bell’s inequality says that locality implies a certain inequality. This inequality is violated by experiment. Unless the experiments are wrong, the world is not local. If the world is local, then quantum mechanics is also wrong, since quantum mechanics predicts the inequality is violated. Besides Jean Bricmont’s book, I also recommend Bell’s book.

        • I think it’s actually that Bell’s inequality says that the world is *either nonlocal, or, has no hidden variables and is fundamentally nondeterministic*

          Bell decided that this was strong evidence for nonlocality, the rest of the physics community decided things were fundamentally nondeterministic. Bohm showed that a nonlocal deterministic model could work for non-relativistic QM.

        • Thanks, I do actually have Bells book and I have actually read most of it, but my knowledge of the mumbo jumbo is too limited to really say. Bell himself seems pretty clear headed, but I don’t really know enough about the QM orthodoxy to understand what he argues against. One thing I find very convincing is that there can be no real theory which splits the world into classical and QM results. The only thing that makes sense is an enormous worldwide wave function or the like. I’ve asked a friend with a PhD in QM from UC Davis if he thinks nonlocality is an essential part of QM in standard interpretations and he says yes it is. So I don’t know where that leaves me. I’ll spend some time on the scholarpedia article but I have to admit I’d like to hear a rebuttal from a second party not involved in writing it before trying to make up my mind.

          I guess one thing that seems to be the case is nonlocality reigns given my UCD friends claim

        • Maybe your understanding of “nonlocality” (if you have a precise definition in mind for that concept) is not the same as your friend’s understanding of “nonlocality”. Realism can also mean different things in discussions of the foundations of quantum mechanics.

        • Unfortunately, taking a survey of professional physicists is not a good approach. E.g., see

          http://arxiv.org/pdf/1306.4646

          There doesn’t seem to be any disagreement as to what the equations say. It is the logic that many physicists have trouble with. Bell is quite clear that the only assumption is locality, and you can verify this yourself. But, as Bell remarks in footnote 10 of his “Bertlmann’s Socks and the Nature of Reality” paper, “the commentators have almost universally reported that it [his original paper] begins with deterministic hidden variables”.

          If you have Murray Gell-Mann’s book “The Quark and the Jaguar”, you should compare what Gell-Mann says on pages 172-173 about Bertlmann’s socks. Misunderstanding could hardly be more complete. (You don’t need to know much physics to understand socks.)

        • I should also mention the book I’m talking about is “Speakable and Unspeakable in QM”, I guess there are some other books he wrote too.

          Carlos: it’s very possible you’re right about the different meanings.

        • Yes, by “Bell’s book”, I meant “Speakable and Unspeakable in Quantum Mechanics”. The second edition includes two additional papers. The second edition also has an introduction by Alain Aspect, but it appears Aspect didn’t read the book. Not sure what the editors were thinking when they included Aspect’s introduction; I hope they realized Aspect hadn’t read the book.

          Regarding “realism”, see

          http://www.scholarpedia.org/article/Bell%27s_theorem#Bell.27s_theorem_proves_the_impossibility_of_.22local_realism.22

          By the way, Schrödinger’s point with the cat was that it is absurd to claim that a cat is both alive and dead. Schrödinger was right. Einstein was basically right too, except he thought the world was local. Of course, he didn’t have Bell’s Theorem.

          In Bell’s book, the “Bertlmann’s socks and the nature of reality” essay is a good place to start, if you don’t want to read the whole book.

          Here is a quote from

          http://www.scholarpedia.org/article/Bell%27s_theorem#Classical_versus_quantum_probability_.28and_logic.29

          “The alleged need to abandon classical probability theory is sometimes also argued for on the basis of an incorrect analysis of the double slit experiment. However, as long as the usual meanings of words are kept, there is no need to get rid of classical probability theory (or classical logic).”

        • Bell does define “locality”. See the articles collected in his book or see http://www.scholarpedia.org/article/Bell's_theorem#Bell.27s_definition_of_locality . Here is a quote from the latter: “Bell explained the ‘principle of local causality’ as follows: The direct causes (and effects) of events are near by, and even the indirect causes (and effects) are no further away than permitted by the velocity of light. In relativistic terms, locality is the requirement that goings-on in one region of spacetime should not affect — should not influence — happenings in space-like separated regions.”

        • David Marcus,

          Yes, your interpretation of Bell’s Theorem is correct. Most people are not willing to give up locality *or* realism, yet Bell’s Theorem and experimental violation of the inequality demands one of those to be wrong.

          I think you are a little quick to dismiss dropping local realism – it’s the most literal interpretation of quantum mechanics (or, better, it’s an interpretation of classical mechanics in terms of quantum mechanics, rather than the other way around). I don’t think there’s any reason classical states need be fundamental, so long as quantum mechanics predicts (as it does) that our observations are always of definite states.

        • As you said above, realism is the idea that it’s absurd for the cat to be both alive and dead. Norsen would call this ‘perceptual realism’, though I disagree with his conclusions.

        • Schrödinger’s point was that the cat must be either alive or dead before we look. If you agree that it is alive or dead after we look, then you are not denying perceptual realism. If you insist that it is both alive and dead after we look, then it isn’t clear what you mean by “locality”. And, we are probably both having and not having this discussion.

  12. The book of Diaconis and Skyrms has a number of very nice perspectives on this question.

    Possibly the simplest one-line summary of how to view probability, is that it the formalization of plausible inference (to use Polya’s phrase).

  13. I really enjoyed this thread!

    I am not really qualified to have an opinion of my own, but I lean towards Arturo’s statement. Certainly the “probability” of where a bullet ends up is not a mathematical concept, per se. If you have enough measurements, it is fully deterministic, so the math belongs in finite element analysis. It seems to me that the infamous two slit experiment was intended to demonstrate something…somewhere, that cannot be reduced to determinism in some way. If so, and there is really determinism hiding in there somewhere, can probability really be a mathematical concept or is it just a method of quantifying uncertainty of outcome due to incomplete knowledge for events that haven’t happened yet?

    EJ Wagenmakers wrote:

    “OK, I’ll take the bait: I think this is too easy. The interpretation of probability…”

    I read the comment twice but I am not sure of the answer to the question “what is probability?”.

  14. David Marcus wrote:

    “The two-slit experiment does not demonstrate non-determinism.”

    So then is probability anything more sophisticated than the mathematics of our ignorance of the future? And if so, it is anything other than simply an alternative method of accounting for uncertainty? Because if it isn’t, it is so confusing to folks that we would be better off just getting rid of the language associated with it.

  15. Once more too late to the party but… in some discussions I’m fully with Andrew on this one, particularly whenever somebody tries to claim that probability is “really” one thing and not the other (as long as both fulfill Kolmogorov’s axioms or do whatever it takes to work as a probability mathematically).

    However, probability existed as a concept before Kolmogorov came up with the axioms, and although the axioms were meant to define probability in the realm of mathematics, they were also meant to connect the mathematical definition to a range of ideas about probabilities that were around already. Mathematicians can define concepts within mathematics but their definitions don’t come with any particular authority outside their own field. In model theory mathematicians have a formal definition for what a model is, but this doesn’t mean that everyone who discusses models outside this framework has lost their legitimacy.

    The history of probability before Kolmogorov is very instructive, particularly how people arrived at the explicit insight that when people talked about probability and even published about it, that they use two or even more genuinely different interpretations of that concept, and how this was missed or at least not acknowledged by anyone before 1830 or so. I believe that many of the current interpretations of probability can be traced back to some historical views of what probability is that were not seen as contradictory for quite some time, but seem contradictory or at least incompatible to us.

    Generally I think that terms such as “probability” come with a complex bulk of roots and meanings; they are in use in partly incompatible ways for a long time, and the implications of this don’t go away just because mathematicians try to condense this into a clear unambiguous definition. I’m fine with the mathematicians doing their job there and I will play by their rules most of the time, but I will not grant them absolute definition power (and neither will I grant this to anybody else). Probability is all kinds of things and not just one of them “really”.

    • To an onlooker, like myself, to the statistics and epidemiology fields, I have been aware that many terms are defined and contextualized differently. I guess what surprises me a little is that we ignore [often subtle or overt] discrepancies in meaning. Specifically why has it taken experts so long to acknowledge them. Have there been efforts to standardize definitions? Have there been fruitful exercise in the process?

      Take the descriptive ‘reproducibility’. It is sometimes used interchangeably with ‘replication’. Goodman, Fanelli, and Ioannidis
      explore its definitions in this following article.

      http://stm.sciencemag.org/content/8/341/341ps12

        • Countably infinite additivity seems to be a pure-math issue, in the sense that Hamming meant when he said that he’d never fly in an airplane whose design required Lebesgue integration and not Riemann integrals.

          I’ve argued that all real world measurements are discrete, they ultimately measure things like a count of atoms, electrons, etc. It’s just that those things are so small that we can treat them as continuous for most real-world purposes.

          I am interested in the mixture of finite additivity and IST/nonstandard analysis and whether that gets us anything meaningfully different from Kolmogorov and his sigma algebras. Measure theory doesn’t seem to map to science particularly well, whereas extremely close discrete steps whose discreteness we wish to ignore *does*, and that’s exactly what you get from IST.

    • Thoughtful comment – as if any concept (sign, representation, symbol, word etc.) could have a single meaning that could be fixed for all time.

      > define concepts within mathematics
      These mathematical definitions of the concept are only one aspect (see below) but they do try to make hard pauses in the unfolding process of interpretation.

      “The second grade of clarity is to have, or be capable of providing, a definition of the concept. This definition should also be abstracted from any particular experience, i.e., it should be general. So, my ability to provide a definition of gravity (as, say, a force which attracts objects to a point, like the center of the earth) represents a grade of clarity or understanding over and above my unreflective use of that concept in walking, remaining upright, etc.” https://www.iep.utm.edu/peircepr/

      Now, Christian, is it culture that primarily drives the multi-dimensional evolving interpretation of the concept or the reality that we have no direct access to but which impacts us in many ways when the concept is applied…

      • It doesn’t have to be *either* culture *or* reality of course, surely both play a role that is hard to disentangle. Peirce, insightful as ever, tends to the cultural side in that quote, I’d think, although you may see this differently (actually the two may be hard to disentangle even in Pierce’s words).

        • > surely both play a role that is hard to disentangle
          Of course, that is why I prefaced with primarily.

          > Peirce, insightful as ever, tends to the cultural side in that quote
          More likely he tends to bounce around – he raises numerous interesting considerations on any question, deliberates these repeatedly (one might even say incessantly and endlessly) and seldom brings any closure that lasts (for him anyway).

          It seems, that after 1908 Pierce does give a larger role to the object of the sign rather than the interpretant of the sign.

          If I am understanding him? this gives the reality that we have no direct access more of a role in shaping how we represent and interpret it.

          OK a bit of moderation of the constructionist perspective, although you may see this differently ;-)

  16. I was thinking about mathematical definition of probability (Kolmogorov axioms) as a bridge between different concepts rather then the foundations. You can define probability aleatoricly as a long-running frequency or epistemicly as a degree as believe, but as you can use the same mathematical framework for both of them the definition does not matter so much.

  17. “Probability is a mathematical concept.”
    I’m struggling to understand how this could be both true *and* not trivial. Consider:

    “The concept of probability has been used long before Kolmogorov; we still use the term, both in science and ordinarily, in ways that do not correspond to a mere mathematical formalism; there are alternative axiomatizations of probability (e.g. Cox’s; some people reject countable additivity, etc)); etc; etc.”
    “Yes, but those are different meanings of probability. I’m specifically referring to the mathematical concept.”

    So, the mathematical concept of probability is a mathematical concept. Groundbreaking stuff.

    • Pedro:

      If you really want groundbreaking stuff, you shouldn’t be reading blogs . . .

      Seriously, though, see the P.S. above for the reason I wrote this post. I do think there’s a lot of confusion out there.

      • “… see the P.S. above for the reason I wrote this post.”

        Especially the last two sentences: “In practice, probability is not a perfect model for any of these scenarios: long-run frequencies are in practice not stationary, betting depends on your knowledge of the counterparty, uncertainty includes both known and unknown unknowns, decision making is open-ended, and statistical inference is conditional on assumptions that in practice will be false. That said, probability can be a useful tool for all these problems.”

        • Andrew, Martha:

          Thank you for the clarification. I suspect, however, that many people take “probability is a mathematical concept” to imply something stronger than “Kolmogorov’s axioms can be useful for many different purposes, and there are challenges to every interpretation of probability.” Perhaps the post would be less controversial if phrased differently? To give one example, Kolmogorov’s axioms may also be useful to calculate the normalized mass/length/area/volume/etc of physical objects; do you think, then, that physical mass is a probability? I mean, it trivially is in the boring sense that it follows the axioms! But don’t you also think that, say, degrees of belief and long-run frequencies are also “probabilities” in another useful sense? If not, then the dispute between Bayesians and frequentists would be very puzzling (akin to statisticians criticizing physicists for using Kolmogorov’s axioms to calculate physical masses; they are just talking about completely unrelated concepts that happen to use the same axiomatic tool).

        • Pedro said,
          “But don’t you also think that, say, degrees of belief and long-run frequencies are also “probabilities” in another useful sense? ”

          Indeed, I see degrees of belief and long-run frequencies as intuitive/conceptual ideas of probability that we try to *model* mathematically.

          For elaboration on my view, please http://statmodeling.stat.columbia.edu/2014/01/16/22571/#comment-153299, (which Andrew gave a link to in his post), and also the further link (https://web.ma.utexas.edu/users/mks/statmistakes/probability.html) given in that comment.

  18. I very much like this focus on the mathematical formalism but ultimately we have to decide what it means when we make various applications of probability. When we claim that the probability that a certain hypothesis is true or our scientific theory predicts that the probability of measuring an electron spin up is such and such. Yes, the fact that we call these things probabilities implies that we believe they obey the relevant mathematical axioms but we believe those claims have genuine empirical content (we can give the wrong answer to the question of what is the probability the electron will be measured spin up even if we offer a probability assignment obeying the axioms).

    So yah, I’m all for defining probability in general as anything that satisfies the axioms. It avoids the confusion stemming from the assumption that all probabilistic claims refer to the same content. However, it doesn’t relieve us of having to explain what those underlying claims mean.

    • Peter:

      I do not in general think it is a good idea to speak of “the probability that a certain hypothesis is true”; see this article for elaboration of this point; Figure 1 illustrates the sort of probabilistic thinking that I don’t generally like in science.

Leave a Reply

Your email address will not be published. Required fields are marked *