Skip to content

Classical probability does not apply to quantum systems (causal inference edition)

James Robins, Tyler VanderWeele, and Richard Gill write:

Neyman introduced a formal mathematical theory of counterfactual causation that now has become standard language in many quantitative disciplines, but not in physics. We use results on causal interaction and interference between treatments (derived under the Neyman theory) to give a simple new proof of a well-known result in quantum physics, namely, Bellís inequality.

Now the predictions of quantum mechanics and the results of experiment both violate Bell’s inequality. In the remainder of the talk, we review the implications for a counterfactual theory of causation. Assuming with Einstein that faster than light (supraluminal) communication is not possible, one can view the Neyman theory of counterfactuals as falsified by experiment. . . .

Is it safe for a quantitative discipline to rely on a counterfactual approach to causation, when our best confirmed physical theory falsifies their existence?

I haven’t seen the talk, but based on the above abstract, I think Robins et al. are correct. The problem is not special to counterfactual analysis; it’s with conditional probability more generally. If you recall your college physics, you’ll realize that the results of the two-slit experiment violate the laws of joint probability, as we discussed a few years ago here and here.

Given that classical probability theory (that is, the equation P(A&B)=B(A|B)P(B)) does not fit quantum reality, it makes sense to me that the Neyman-Rubin model of causation, which in practice is always applied with probabilistic models, will not work in the quantum realm. If one tries to imagine applying potential-outcomes notation to the two-slit experiment, you’ll see that it just won’t work.

Is this relevant for macroscopic statistics? I don’t know. Here are my thoughts (with Mike Betancourt) on the matter.

I think it’s a fascinating topic.


  1. Rahul says:

    “Is this relevant for macroscopic statistics?”

    In my opinion, the answer is a strong, emphatic, NO! All attempts to convince me otherwise, have so far verged on crack-pottery or a misreading of the nuances of QM itself.

    The especially wacky ideas (and flawed, IMHO) of applying QM to macroscopic phenomena come from people in the social sciences.

  2. Alexandre says:

    There is a quantum measure theory (an extension to the mathematical discipline called “measure theory”), that goes as follows:

    If M is a quantum measure and Omega is the universe set then:

    1. M(Empty) = 0
    2. M(Omega) = 1
    3. For any disjoint sets (measurable in the quantum sense) A, B and C: M(A U B U C) = M(A U B) + M(B U C) + M(A U C) – M(A) – M(B) – M(C)

    Notice that, if A and B are disjoint sets then, in some quantum experiments, (A U B) cannot be always measured from the measurements of each isolated piece A and B as is usually considered in the classical measure theory. In these cases, we must compute a specific measure for the set (A U B). Naturally, if M(A U B) = M(A) + M(B) for all disjoint measurable sets A and B, then the usual probability measure emerges, but it is not the case in quantum experiments. The axiom 3. is called grade-2 additivity

    There is a connection between M and the wave function. For more on this, gogoogle it: “quantum measure theory”.

    Alexandre Patriota

  3. Entsophy says:

    […] brings out the silliness in smart people like Quantum Mechanics; a subject I always associate with … R. A. Fisher. I confess to liking Fisher more than […]

  4. konrad says:

    “the results of the two-slit experiment violate the laws of joint probability”

    This makes no sense at all. The result of a _physical_ experiment cannot violate a _mathematical_ law. It can only show that a particular _model_ provides a poor description of reality. Yet you repeatedly say that it is probability theory (rather than the model, e.g. by using an inappropriate choice of state space) that is at fault:

    “Given that classical probability theory (that is, the equation P(A&B)=B(A|B)P(B)) does not fit quantum reality”

    “If classical probability theory (which we use all the time in poli sci, econ, psychometrics, astronomy, etc) needs to be generalized to apply to quantum mechanics” (in one of the linked posts).

    There is a world of difference between needing to discard a poor model (something we do all the time) and needing to generalise probability theory itself (which is not on the cards here).

    • Andrew says:


      Sure, a physical experiment can violate a mathematical law. The classic example is, if in a universe with closed curvature, you construct a large enough triangle, its angles will not add up to 180 degrees. Another classic example is that, for various particles, Boltzmann statistics do not apply, instead you have to use Fermi-Dirac or Bose-Einstein statistics. Boltzmann statistics is a mathematical probability model that does not apply in these settings. Another example is, in the two-slit experiment, p(A) does not equal the sum over B of p(A|B)p(B). In all these cases, you have a mathematical model that works (or approximately works) in some areas of application but not others. The math is not wrong but it does not apply to all settings.

      • Cedric says:

        I agree with Konrad. You write that QM violates the laws of probabilities. But probability is extended logic. Would you be comfortable if a phenonema were to “violate the laws of logic”?

        Furthermore, Bell’s inequality only rules out *local* hidden variable theories. Bell himself famously wrote Against Measurement:

        • Andrew says:


          Yup. Euclidean geometry is extended logic too, but that doesn’t mean that the angles inside a real, physical triangle have to add up to exactly 180 degrees. Similarly, in real life, it’s not necessarily true that p(A) equals the sum over B of p(A|B)p(B). You either have to abandon the superposition of probabilities (instead adding complex numbers that have phases, just as we learned in college physics) or you have to restrict the use of joint probabilities.

          • Tim Maudlin says:

            So here is a simple point that will help clear some things up. Euclidean geometry is not “extended logic” and the theorems of Euclidean geometry (i.e. the logical consequences of its postulates) are not theorems of logic or logical truths. The theorems of Euclidean geometry, such as that about the interior angles of a triangle, are just what follows from various postulates about the geometrical structure of a space, not implications of logical principles alone. If the interior angles of a physical triangle do not add up to two right angles, that does not and cannot show that there is anything wrong with logic: after all it is by logic that one derives this consequence from the postulates. It just shows that physical space does not obey the postulates, i.e. space is not Euclidean.

            You really ought also to stop a moment and reflect on the claim about “p(A) equals the sum over B of p(A|B)p(B)” that you keep repeating. That only holds if the set of B’s constitute a mutually exclusive and jointly exhaustive set of ways that A can occur. Try figuring out what the relevant set of B’s are supposed to be for the case in hand. In fact, this principle is not violated, just as non-Euclidean geometry does not violate any principle of logic.

          • Cédric says:

            What do you think of Cox’s theorem?

            I feel very strongly that the laws of logic and probabilities are true [i]and applicable[/i] in all conceivable universes, even those without space-time. They will be found by any smart creature living therein, and used to reason under uncertainty. This is not necessarily true of geometry.

            A model is a set of assumptions. “p(A) equals the sum over B of p(A|B)p(B)” is not a model any more than “2 + 2 = 4” or “A => ¬¬A” is a model. They are true facts (tautologies) in all conceivable universes. Would you argue that “2 + 2 = 4” is not necessarily true?

            • Andrew says:


              Not so many years ago, people thought Euclidean geometry was mathematical truth.

              The expression “p(A) equals the sum over B of p(A|B)p(B)” applies in some settings but not in others. In classical mechanics with uncertainty (i.e., latent variables) it works just fine. In quantum mechanics, though, you can’t take “B” (the slit indicator in the 2-slit experiment) as a classical latent variable and average over it. You have to either expand your notation or use complex wave functions.

              Complex wave functions are a generalization of classical probability. And quantum mechanics is famously counterintuitive.

              • Cédric says:


                I disagree, but Michael is right that your viewpoint is that of >90% of quantum physicists out there. It’s probably not a coincidence that so many Bayesians disagree with it. Thank you for the discussion.

                Final question: suppose you’re back in time as a physics undergrad, and your advisor hands you a box with 1 gold atom. You’re going to measure its position, but before doing that, you want to make a prediction – i.e. compute its expected position, say. Unfortunately, your advisor forgot to tell you which of the three gold isotopes A, B, C it is, and that is important information. Do you think that the gold atom is in a quantum superposition of the three states? Would you use this (non-quantum) formula to compute your expectation of the position

                E[x] = E[x|A]*P(A) + E[x|B]*P(B) + E[x|C]*P(C)

                with a reasonable prior (taken from a table perhaps) for each of the three P(A), P(B), P(C)?

              • konrad says:

                “Complex wave functions are a generalization of classical probability” – another bizarre claim. Complex wave functions represent system states; in what sense can a system state be thought of as a generalization of probability? (Feel free to use “probability distribution” to refer to either of its usual meanings – an information state or an empirical property (a frequency distribution) of a system – or some new meaning; just tell us which you are using.)

              • Andrew says:


                In the words of Wikipedia, “In quantum mechanics, a probability amplitude is a complex number whose modulus squared represents a probability or probability density.” Classical probabilities are real numbers. They superimpose, and when you add positive probabilities you can’t get zero. In contrast, quantum probability amplitudes have phases, and you can superimpose them and get zero probabilities, as in the two-slit experiment. In the macroscopic world (the positions and momenta of “billiard balls” and the like), the phase information can be ignored and one can work with classical probability theory, no need for complex amplitudes.

              • konrad says:

                Interesting. So your reasoning is that, if a mathematical structure can be constrained in such a way that the constrained version does not violate the Kolmogorov axioms, then the structure in question is a generalisation of probability. Regardless of what it actually denotes.

                Personally, I prefer to start with thinking about what the concept of probability denotes (for most Bayesians, an information state; for most frequentists, the frequencies of outcomes in a repeatable experiment) and looking for an extension that denotes the same thing. I wouldn’t call something an extension of probability just because I can calculate probabilities as a (many-to-one) function of it.

              • phayes says:

                Konrad, are you saying that probability theory should not be regarded as a special case of quantum theory just because it’s commutative? ;-)


              • konrad says:

                No, I’m saying that probability theory deals with how the conclusions we can draw changes as a function of available information, whereas quantum theory deals with how a physical system evolves in space and time. Apples and oranges.

              • Bill Jefferys says:

                To make Konrad’s point a little more explicit, the probability amplitudes of quantum theory are just devices used to calculate what the probabilities of certain events are. But these amplitudes depend on the experimental setup, e.g., one slit or two, where the slits are, whether the fact of a particle going through one or another slit is observed. But you will notice that even in this case, the amplitudes depend on how the experiment is set up, so the probabilities that are computed from them also depend on how the experiment is set up.

                This is the point of my comments about how you have to condition on which experiment is being performed, just as Leslie Ballentine wrote in the paper I cited.

                I mentioned also Ed Jaynes’ comment that many “paradoxes” in probability theory can be resolved by conditioning on all relevant prior information. He was urging people to think about what prior information is relevant to the problem they are considering, and actually to condition on that information EXPLICITLY so as to make clear (that is, write E1, E2, …) on the right-hand side of the conditioning bar so that we see, explicitly, what we are talking about.

            • phayes says:


              If you don’t make a distinction between quantum theory and quantum mechanics then it’s apples and oranges, yes, but I don’t think that’s a good idea. As Streater says (in that paper I linked to):

              “It took some time before it was understood that quantum theory is a generalisation
              of probability, rather than a modification of the laws of mechanics. This was not
              helped by the term quantum mechanics; more, the Copenhagen interpretation is
              given in terms of probability, meaning as understood at the time. Bohr has said
              [35] that the interpretation of microscopic measurements must be done in classical
              terms, because the measuring instruments are large, and are therefore described by
              classical laws. It is true, that the springs and cogs making up a measuring instrument
              themselves obey classical laws; but this does not mean that the information held on
              the instrument, in the numbers indicated by the dials, obey classical statistics. If the
              instrument faithfully measures an atomic observable, then the numbers indicated by
              the dials should be analysed by quantum probability, however large the instrument

              Apples and Cox’s Orange Pippins.

              @Bill Jefferys

              You (and Jaynes and Ballentine) are clearly correct about the conditioning business and the interpretation of the double slit experiment and the most important thing about it as far as I’m concerned is that no “psi-ontology” is needed to see that it’s correct. [ ] ;-)

    • Konrad,

      Yes, we can make the argument the the use of classical probability to model quantum systems is a poor choice of model and should be discarded. Yes, that has no effect on classical probability as an axiomatic mathematical entity.

      But what it does say is that we need a better probability theory (perhaps, a generalized one) that relaxes the classical axioms and does model quantum systems well. And what’s interesting about this approach is that a theory relaxing certain axioms might have utility in modeling very complicated macroscopic systems (incorporating certain unknown biases or interactions, for example).

      • Walt says:

        We don’t need a new probability theory, because people have already invented about a billion different formalisms for this. Quantum mechanics isn’t exactly new.

        • Sure — I was referring to “new” as in “new to people considering just classical probability”. There are lots of extensions/generalizations out there and there may be some utility. It’s an interesting applied question — a very different theoretical question.

    • konrad says:


      There is no mathematical law stating that the angles of triangles add up to 180 degrees in general, only a law stating that this happens in a Euclidean geometry. A curved universe does not violate mathematical law, it only violates the (poor) modelling assumption that it’s geometry is Euclidean. Now in this example (if Euclidean geometry is all you have to start off with), the problematic modelling assumption is an axiom of the mathematical framework so relaxing it requires generalising the mathematical framework. To make the same argument in the case of QT, you need to point at an axiom of PT (or a consequence of its axioms and only its axioms) that is inconsistent with experiment. It is not enough to show inconsistency in the context of a whole bunch of modelling assumptions that are extraneous to PT.

      “Boltzmann statistics is a mathematical probability model that does not apply”. Exactly – all of these are cases where a poor model is falsified by experiment. Don’t blame probability theory.


      “But what it does say is that we need a better probability theory (perhaps, a generalized one) that relaxes the classical axioms”

      It does not say that, because there are assumptions besides the PT axioms in play. Such as the assumption (discarded by the Copenhagen interpretation) that the position of a photon is well-defined and unique at all times.

      • Andrew says:


        Consider your last sentence. It is fundamental to probability theory that events can be defined conditional on other events. Hence notation such as p(x,y), p(x|y), p(y|x). The core of classical probability is that the definitions of “x” and “y” don’t depend on how other variables in the system are measured. Hence the problem with applying probability theory to the two-slit experiment etc.

        This is not news. The mathematics of probability amplitudes (wave mechanics) is different from the mathematics of classical phase-less probability.

        • konrad says:

          Andrew, I assume you are defining your x and y as in the first of your linked posts. That is, in experiments 1 to 4 of that post y is the place on the screen that lights up (this is measured and hence definable in all four experiments), and in experiment 4 of that post x is the slit at which the photon is observed. Importantly, x is undefined (and not meaningfully definable) in experiment 3 where the setup does not observe the photon going through a slit. In a response to Tim Maudlin’s comment below you use the variable x in what appears to be a reference to experiment 3, where it is not defined.

          Please clarify: are you claiming that PT is violated in experiment 4, and if so, how? Are you claiming that PT is violated in experiment 3? If so, what is the second variable you have in mind and how is PT violated? Are you claiming that PT is not violated in either experiment separately but in the combination of the two experiments? If so, what is the link between the experiments and in what way can they be combined?

          • konrad says:

            ps Some potential confusion can be avoided if one avoids overloading notation. It may be helpful to use the symbols y3, y4 and x4 for the three relevant variables defined thus far, and x3 should you choose to define such a variable in experiment 3. This could help avoid pitfalls such as assuming that p(y3|x3) is known or estimable when in fact only p(y4|x4) is known.

  5. Tim Maudlin says:

    The two slit experiment does not violate any laws of probability. The phenomenon, in the first place, is accurately predicted by the deBroglie/Bohm theory, which uses a deterministic dynamics and fixed probability distribution over initial states, and nothing but classical probability theory. The argument that there is problem with classical probability, which can be found in Feynman, is just an error. Given the probability of some outcome with slit A open and slit B closed, and the probability of the same outcome with slit B open and slit A closed, probability theory alone has exactly zero implications about the probability for the outcome with both slits open. How could it?

    • Andrew says:


      I discuss this in my linked blog post. But, in brief, the intuitive application of probability theory to the 2-slit experiment is that, if y is the position of the photon and x is the slit that the photon goes through, that p(y) = p(y|x=1)p(x=1) + p(y|x=2)p(x=2). But this is not true. As we all know, the superposition works not with the probabilities but with the probability amplitudes. Classical probabilities don’t have phases, hence you can just superimpose them via the familiar law of total probability. Quantum probabilities work differently.

      • Tim Maudlin says:

        I have no idea what the “intuitive” application of probability theory is supposed to mean. probability theory is a mathematical theory and, as I said, there are perfectly well-defined and exact physical theories that use “classical” probability to make statistical predictions and return the exact prediction of the quantum mechanics. So this is a decisive counterexample to the claim that the 2-slit phenomena are somehow incompatible with classical probability theory.

        If by “intuitive” you mean the judgment that whether or not slit A is open can have no influence on what a photon that goes through slit B does, that is not a principle of probability theory! It is a bit of naive physics, I suppose. The 2-slit experiment refutes this naive physics, but does not, and cannot conflict with probability theory.

        • The point is that classical probability does not describe quantum statistics — entanglement (or equivalently its consequences on conditional probabilities) is inconsistent with the Komolgorov axioms. There’s no way around that.

          • konrad says:

            No, it is inconsistent with a particular representation of the state space of a particle. Nothing to do with probability theory.

          • Tim Maudlin says:

            Once again, there exist well-defined theories (DeBroglie/Bohm and the GRW collapse theories, for example) that make all of these predictions and use classical probability theory in a perfectly normal way. No Komolgorov axioms are violated. That is just a clear mathematical fact about these theories. To say “there is no way around that” is not true: several ways around that exist.

            In the deBroglie/Bohm picture, in addition, every particle goes through exactly one slit. The particle trajectory, however, is influenced by the state of the other slit via its dependence on the quantum state, which obeys the Schrödinger equation. In the GRW picture, it is not correct to say that a particle goes through exactly one slit: in a certain sense, it goes through both when both are open. But none of this is inconsistent with, or requires any modification to, classical probability theory. It is really not helpful to say that something is impossible when it has been done, and done in several different ways.

            • Firstly, all the ontologists should step out of the room.

              What Andrew was saying is that, if you maintain the standard assumptions of locality and unitarity in physics then quantum probabilities are inconsistent with classical probabilities. Yes, you can weaken the standard assumptions to restore the consistency, but now you’re changing the underlying system (incidentally, I have no problem with nonlocality but you’ll have to do better than BB — at least go with something where Poincare invariance is emergent).

              But we’re not talking about changing the system. The point is that in complex modeling circumstances you may be making poor assumptions, but they’re too difficult to manipulate and you’re stuck with them. If one understands how to generalize the probability theory to achieve results equivalent to changing the underlying assumptions then you can build a more robust modeling tool appropriate for certain situations. And if this (i.e. the standard assumptions about probabilistic systems being broken) is true, then it should manifest as some violation of the standard assumptions, a la Stern-Gerlach or Bell.

              • Tim Maudlin says:

                “All the ontologists should step out of the room”? This is the response to a straightforward counterexample to a mathematical claim? You say something is impossible, and it is pointed out that the supposedly impossible thing has been done and you ask the person who points this out leave? well, that’s one way to deal with a counterexample….

                Locality and unitarity are obviously, obviously, obviously not principles of classical probability theory, no matter how one understands that term. If all you mean to say is that no local theory can return the prediction of quantum theory: yes, that’s precisely what Bell proved. This has exactly nothing to do with probability theory.

                It really does not help anyone’s understanding of anything to make false claims then ask people pointing out they are false to leave.

              • I was referring to the fact that Andrew’s argument is an epistemological one. We’re not talking about which theory is the correct, only which theories are consistent with the data and then what those theories might imply, especially relative to the standard assumptions of quantum mechanics. Any discussion past that is not appropriate for this forum as it has no relevance for applications, hence ontological arguments should not be considered further.

                As has been previously noted in the comments, no one is questioning the validity of classical probability given the axioms. The question is the physical validity of the axioms. Assuming locality, those axioms are not consistent with quantum mechanics (as a special case of a fully relativistic quantum field theory) and if locality is not abandoned then the axioms have to be modified. The relevance for applied statistics is whether or not complex systems with insufficient constrains might necessitate similar constructions, providing a possible path towards new tools in systems that have been notoriously hard to model. Likely? Probably not, but it’s not impossible and easy enough to check for with some well-designed experiments.

                Forgoing locality is fine, but doesn’t provide any potential for applied statistics and is consequently not relevant to this discussion. Not to mention that non-local theories have trouble becoming fully relativistic without provided an emergent basis for Poincare invariance and rapidly become unappealing as physical, read useful and predictive, theories. But, again, this is a physics discussion and not relevant to the current thread (or forum, for that matter).

      • Roger says:

        I agree with Tim that there is no contradiction with classical probability theory. In quantum mechanics, a photon is not a classical particle, but also has wave properties. The photon history is not just the sum of two particle possibilities. It can also be a wave that passes thru both slits at once.

        The double slit experiment does show that light has wave properties. Every has agreed to that since 1803. If you deny that light is a wave that can go thru both slits at once, then you can get a contradiction. That is another way of saying the same thing. But the contradiction is with the classical particle theory of light, and not with probability theory.

        • Andrew says:


          In the two-slit experiment, p(A) does not equal the sum over B of p(A|B)p(B). This violates probability theory, or at least the version where one can assign probabilities to measured outcomes, which is the version of probability that is used in applied statistics.

          • Walt says:

            Andrew, you’re trying to fix the state space used to explain the experiment to force this conclusion. Does there exist a state space with a classical probably distribution that describes the outcome of the experiment? Yes there does. It’s a much bigger state space where you have to include which experiments you actually did, etc., but it exists. For example, you can make the state space be probability amplitudes themselves.

            It’s probably less _useful_ to do it that way, but it’s not impossible.

            • Andrew says:


              Exactly. You can model the 2-slit experiment using classical probability but only by expanding the sample space in a way that is awkward enough that we only do it because we have to. If the 2-slit data looked like what you would get from classical probability (superposing probabilities rather than complex densities), there would be no problem.

          • Roger says:

            When you do that sum over B, you are not summing over all possibilities, but only certain outcomes of measurements that are known to disrupt the system. In particular, you assuming that the light is a photon that can be modeled as a classical particle that is confined to one slit. That assumption is false.

              • Tim Maudlin says:

                Nope. In the Bohm theory, electrons (for example) are particles and do go through exactly one slit. If you like, you can calculate the probabilities for outcomes conditional on going through slit A (with both open) and conditional on going through slit B (with both open), and the total probability for the outcome is just the sum, of course, because every particle does exactly one or the other. And you still get the right predictions. So this “diagnosis” is also demonstrably wrong.

  6. Tim Maudlin says:

    This is a really strange forum. The original post contains this sentence: “If you recall your college physics, you’ll realize that the results of the two-slit experiment violate the laws of joint probability”. That sentence is false. It’s falsity is demonstrated by theories (whether you like them or not) that predict exactly these results using classical probability theory, in any sense of the term one might like to give. Then you are told not to mention this counterexample because “this is a physics discussion and not relevant to the current thread”. Well, the current thread started with a false claim about a physical phenomenon.

    If you don’t care that what is posted here is false (and, by some of the comments posted, some people reading the thread are very confused), why have a forum at all? If you want to dispute that the sentence is false, then show the proposed counterexample (which is, by the way, both local and unitary as well in this application, although that is not really relevant) isn’t really a counterexample.

    As for applied statistics: well, the deBroglie/Bohm theory makes exactly the same predictions for observations as standard non-relativistic quantum theory, so if you think quantum mechanics is useful for applied statistics, that theory is exactly as useful. But again, it would be best to either just acknowledge the falsity of the false claim and try to correct it, or explain why the proposed counterexample isn’t one.

    • Andrew says:


      The 2-slit data indeed violate the laws of joint probability. I learned about this in physics class in college. In quantum mechanics, it is the complex functions that superimpose, not the probabilities. It is the application of the mathematics of wave mechanics to particles. The open question is whether it might make sense to apply wave mechanics to macroscopic measurements. For example, when we model voting behavior or test scores, we use the classical probability model in which conditional probabilities add up using the formula p(a) = sum_b p(a|b)p(b). But maybe there are settings where it would make sense to model p(a), p(b) etc. as complex functions with phases, in which case we would be using quantum probability models.

      • Tim Maudlin says:


        The 2-slit data do not violate any laws of probability. Whatever you think you learned in physics class in college, it was not this. That claim is just wrong. One proof that it is wrong is the existence of a theory using standard probability that predicts the data. You might just focus on that fact and then try to figure out where you have gotten confused. But if it will help: there is nothing in classical probability theory that says that the data with the experimental condition with both slits open has any mathematical relation at all to the data with only one slit open. If I am reading your post correctly, you seem to think this: if some outcome happens a certain proportion of the time PA with only slit A open, and a certain proportion of the time PB with only slit B open, then probability theory says with both slits open the probability must be PA + PB. But neither classical probability theory nor anything else has any such implication. It is trivial of think up phenomena in classical physics that violate that principle, or in everyday life.

        Try filling in the ‘a’ and ‘b’ in your formulas with actual conditions, not just letters. Maybe that will make the point clear to you. What is it you think “a” and “b” stand for here?

        • Andrew says:


          See my response to Walt above. I agree that one can place the two-slit experiment within a classical probability model, but this model has an extra level of complication owing to the Uncertainty Principle. The two-slit results are counterintuitive, and they are counterintuitive because we expect probabilities, not complex numbers whose squares are probabilities, to superimpose.

          • Tim Maudlin says:

            This has nothing to do with the uncertainty principle! The theory probabilities for outcomes conditional on experimental set-ups. One experimental set-up has only slit A open, one has only slit B open, one has both slits open. Here is a direct question: please answer this. Do you think that given the data with only slit A open and the data with only slit B open, classical probability theory has any implications at all about the data with both open? If so, what are the implications? If not, how can the data “violate classical probability”?

    • Andrew was very clearly assuming the standard interpretation of quantum mechanics adopted by 99% of physicists and, more relevant to the exact details of his comment, just about every college syllabus. Within that context nothing no false claims are being made.

      Given your knowledge of the subject you clearly understood the assumptions Andrew was making but, instead of pointing out the different approaches to formalizing theories of quantum mechanics given the Bell results, you attack the result as wrong based on pedantic arguments of the assumed context and, unfortunately, derailed a possibly productive conversation. Moreover, the existence of alternative theories of quantum mechanics (which NO ONE is arguing do not exist) is completely irrelevant to the original point, as the use of generalized probability theories is only MOTIVATED by their appearance in orthodox quantum theory, not in any way DEPENDENT on it. Hence the inappropriateness for this discussion.

      I do not believe there is anything else to say on the matter.

      • Tim Maudlin says:

        This is just ridiculous at this point. The 2-slit experiment has certain data. The claim I quoted is that the data is inconsistent with classical probability. That claim is just flatly false. The deBroglie/Bohm theory, in the non-relativistic domain “makes precisely the same predictions for the data as “standard” quantum mechanics*, whatever you mean by that term. In fact, I have no idea at all what “assumptions Andrew was making”. If you want to make them clear, make them clear. That would be productive. At some point, you seemed to think these additional assumptions are locality and unitarity. Well, in the 2-slit experiment, in the theory I mentioned, the quantum state always evolves unitarily (if that is what you want) and there is no violation of locality.

        I have asked Andrew to fill in what he means by ‘a’ and ‘b’ in his post. That might be productive. If you want to spell out these supposed tacit “assumptions”, that might be productive, and might lead to a statement that even could be true. But you seems instead just to want to shut down any clarification.

      • Roger says:

        Tim’s argument does not depend on assuming some esoteric interpretation of quantum mechanics. It is a physical fact that a 2-slit experiment is not the sum of 2 1-slit experiments. Andrew has converted this statement into a statement about probability, and concluded that the probability theory is wrong. No, his physical assumption is wrong.

        If you are so sure that 99% of the physicists and textbooks are on your side, it might help if you cite them saying that the double-slit violates probability.

        • Bill Jefferys says:

          In my opinion, Andrew is wrong about this and Tim is correct. We discussed this earlier (Andrew’s first link, above: ). I pointed out there that Leslie Ballentine showed here: that the two-slit experiment is completely compatible with classical probability theory. In particular see my comments here:

          and here:

          The mistake is that the experiment with one slit open is simply not the same experiment as the one with two slits open. Why then would one analyze the different experiments as if they are the same? The answer is that you can’t. You must condition on all the information at your disposal (a point that the physicist Ed Jaynes has made repeatedly and which he blames for many of the apparent “paradoxes” that have been claimed in probability theory), and this includes whether one slit or two are open. Since you are conditioning on different things, you are no longer allowed to sum over everything since the rules of probability theory don’t allow you do sum over the conditions, only over the stuff to the left of the conditioning bar. In other words, for experiment 1 you have


          and for the second you have


          But you can’t sum over E1 and E2 in this experiment since it’s sitting over on the right of the conditioning bar and you aren’t allowed to sum over those. You can of course sum over B separately in each of these but that doesn’t lead to problems since the experiment is fixed. You could select E1 or E2 at random with probabilities P(E1) and P(E2), but that would just give you a mixture model and now summing will correctly tell you the result of making a measurement on a mixture model. But again, it’s all classical probability theory.

          I wasn’t able to convince Andrew at that time that the point of view he is taking here is not correct, and I probably won’t be able to convince him this time either. But I am in Tim’s camp here.

      • You guys are kidding me, right? Locality, causality, unitary, and consistency with the Bell results requires that the states (STATES not observables) of any entangled system do not obey classical probability theory (but can be modeled by various generalizations of axiomatic measure theory). 99% of physicists will take this at face value — pick up any quantum text. You’re welcome to cling to classical probability if you give up locality, causality, or unitarity, but those are not at all common choices in physics, especially given the difficulty they provide in formulating a proper quantum field theory.

        The relevance of the double slit is the inconsistency of conditioning a system on a measurement at one of the slits with the conditioning of any observable given the resulting state. Cycles of coherence/decoherence do not respect the rules of classical probabilities, unless you take another perspective and make measurement a completely different process.

        • Tim Maudlin says:

          So we start with a clear, and clearly false, claim: the 2-slit phenomena are incompatible (in some sense) with classical probability theory. That simple, clear, false claim was to be understood as this claim: “given locality, causality, unitarity, and consistency with the Bell results, the states of any system do not obey classical probability theory”. Well, this fancier claim is certainly not what was meant (2-slit obeys Bell’s inequalities in any case), and is either empty of false. Empty, because locality (as Bell defined it) is incompatible with violations of Bell’s inequality: that is just the content of his theorem. So on that reading, no theory can be local and be “consistent with Bell’s results” if that means violating his inequalities, as quantum theory does. Or in any case, it can’t give the predictions of quantum mechanics. If “locality” means no-signalling, then Bohm is again a counterexample. What 99% of physicists think is neither here nor there about anything.

          It ought to cause some pause that Feynman himself makes exactly this erroneous claim about the 2-slit experiment in the Lectures. Feynman does not mention locality, unitarity, or causality. He makes a straight claim about the data, based on a bad argument—exactly the argument I was attributing to Andrew. So if Feynman screwed this up, it would not be odd of many other physicists do too.

          So here’s another simple question for Andrew: is the argument you have in mind the same or different from Feynman’s? An answer will help. If it is the same, I am glad to work through that text line-by-line and point out the mistakes.

  7. bxg says:

    > Locality, causality, unitary

    I’m out of my depth here, but wonder – if we accept these – is it known to be sufficient to develop a new and consistent probability theory – or is that perhaps just the start of the fix-up’s needed?

    Does boolean logic survive? (Are there X, Y such that left implies X and right implies Y, and we accept left \or \right, while
    X \or \Y is not thereby implied?) And if logic changes, where does it stop? Might we end up having to distorting all of mathematics and reason to make the assumptions tenable?

    That would seem an outrageous situation (what would “true” and “false” even mean when the most fundamental rules are negotiable?) But maybe it’s actually known that we can stop the fix-ups to “classical X” at “X = probability theory” – is there any sense in which that is so?

    • konrad says:

      I hate to disappoint, but the structure of the argument is “Well-established axiomatic system X plus questionable assumption Y doesn’t work. We really like Y, so let’s toss out X.” The argument is equally sound (or unsound) regardless of whether X is Boolean logic or probability theory.

      • bxg says:

        You don’t disappoint, but you don’t answer my implicit question so I was unclear. I am trying to ask something which
        may not make sense, but if it does it’s specifically about quantum theory and not about argument structure.

        It’s fairly obvious IMO that “classical probability theory is violated” is false. But let’s take the charitable interpretation; someone says they would like to work as though “Y” is so (since “99% of physicists” do) and is willing
        to accept other large compromises in order to work that way. This is just a mode of thought; we aren’t saying Y is ‘true’ (what a weird argument to authority that would be!) but rather asking: can we work “as though” Y, given some clear and limited modifications from X to X’. Even if so it wouldn’t be a refutation of X but just “if you really insist on thinking and working as though Y, here’s something else (X) you must treat differently, and how (X’)”.

        But if there is no coherent and bounded X’, if the attempt at consistency just spirals out and out to take everything with it, there’s not even a utility argument. Then there’s no useful sense in which we can say “Ok, if you really want to reason as if Y (for whatever personal reason), you need to change your beliefs on [what goes here?}]”.

        That’s what I am asking. Whether you think it’s useful or not, is there a bounded “X” we can toss out and replace by some “X'” if someone for their own idiosyncratic reasons thinks “Y” has primacy? Or instead does accepting down Y inevitably bring down all of human reason once you follow it to its conclusion? This is question about quantum theory, or the two slit experiment in specific.

        • Tim Maudlin says:

          This is a question that would have to be answered on a case-by-case basis, and in this case one would have to determine whether it even makes sense to “revise” X and if so, whether the revision really accomplishes anything. (Since one has to use logic to answer these very questions, the whole idea of “revising logic” can obviously be tricky.) But I appreciate that you understand the situation, namely that none of this is forced by any data or phenomena. Once more Bell: “Why is the pilot wave picture ignored in textbooks? Should it not be taught, not as the only way, but as an antedote to the prevailing complacency? To show that vagueness, subjectivity, and indeterminism [one here adds: supposed revisions of probability theory or logic] are not forced on us by experimental facts, but by deliberate theoretical choice?”—On the impossible pilot wave.

  8. Tim Maudlin says:

    “When one forgets the role of the apparatus, as the word ‘measurement’ makes all too likely, one despairs of ordinary logic—hence ‘quantum logic’. When one remembers the role of the apparatus, ordinary logic is just fine.” John Bell, Against ‘measurement’. Exactly the same can be said for “probability theory’. Thinking that the phenomena predicted by quantum theory either require or even suggest a need to revise logic or probability theory is a mistake. No changes are required, and no proposed changes have ever helped with any problem. Bell is the best place to start to understand this. See also “On the Impossible pilot wave”.

  9. For a discussion of the hydrodynamic pilot wave analogs see this discussion in PNAS:

  10. Tim Maudlin says:

    Bell is the first thing to read on foundational issues in quantum theory. If you don’t have it, get Speakable and Unspeakable in Quantum Mechanics: all the papers are there.

    Yes, the “bouncing oil drop” experiments provide a (manifestly classical) analog system for the two-spit experiment, and make it easy to visualize what is going on. The key is (of course) that the quantum state with both slits open is different from what it is with only one slit open, even when the “particle” goes through one particular slit. When more than one particle is involved, though, you can non longer think of the quantum state as defined on physical space, since it is defined on configuration space. It is only then that violations of Bell’s inequality can arise.

    • If you can, would you care to enlighten me and others on more of the Bohm theory? I have heard and read a little about it before, but would like to get a better picture under a few simple experimental examples. For example, suppose that we have one particle of interest. It starts at point X(0) which we can not determine with absolute precision (in other words, it starts as best as we can determine very near X(0)).

      We can define a wave function psi(x,t) based on the electric fields and things that we set up in our apparatus. And from that we get a path X(t) and a velocity V(t) which comes directly out of the “quantum field” defined by psi(x,t), in other words, the wave psi pilots the particle along the path X(t). Of course since we don’t know X(0) exactly, the actual path will not be known exactly either and if we repeat the experiments, we will get a distribution over the locations X(T_i) where T_i are the times at which we detect a particle in each experiment and X(T_i) is the location… I think I understand this more or less fine…

      now you say the pilot wave has to be defined over the quantum state and not over physical space if you have more than one particle. Is that equivalent to saying that there are N pilot waves over space where N is the number of particles? Do these pilot waves interact with each other? Do they interact with particles other than their paired particle? For a simple example let’s try a Stern-Gerlach experiment…

      We generate two particles which start very near X1(0) and X2(0) which are themselves very near each other in the center of the apparatus, and these particles have entangled spins. We send them along their way left and right in opposite directions through the apparatus. The one that goes left interacts with a screen and we see an upwards deflection. Suppose that we set up the experiment so that when this occurs we will ALWAYS see a downwards deflection on the screen in the other direction.

      we’ll set up two pilot waves psi_1(s1,x,t) and psi_2(s2,x,t). Entanglement means that s1 = -s2 but we don’t know which one is +1 and which one is -1. To determine what’s going to happen in the experiment, we need to do the math of propagating the psi_1 and psi_2 waves and piloting the particles. particle 1 is piloted by psi_1 and particle 2 by psi_2, is that right?

      Finally, because we don’t know the spins of the particles, only the fact that they’re opposite, we need to also pilot the waves via psi_1′ and psi_2′ which are the pilot waves for the case where the spins are interchanged. Once we observe the outcome on the left, we’ll immediately know which of the pilot waves psi_1 vs psi_1′ actually occurred, and hence we’ll know psi_2 vs psi_2′ and will be able to predict the result on the right.

      If my interpretation is right, I think your point about “configuration space” has to do with the fact that at any given location, there is not *one* field defined by *one* pilot wave, but rather N fields defined by N pilot waves (in my example N=2) is that correct?

      • Tim Maudlin says:


        No, the point is exactly not to have two “pilot waves”, one for each particle. The “pilot wave” in Bohmian mechanics is just the wavefunction that anyone is already familiar with from any quantum mechanics textbook, and always evolves via the Schrodinger equation. That wavefunction is a complex-valued function over *configuration space*, not physical space. That is, each single point in configuration space corresponds to a complete configuration of the system, i.e. to specifying the exact location of all of the particles. In the case of a single particle, the configuration space is isomorphic to physical space, because you say where every particle is by giving just one location. If there are many particles, then you have to specify many positions. In general the dimension of the configuration space of an N-particle system is 3N, so the configuration space for, e.g., a mole of particles in a box is very,very high dimensional. It is important that the wavefunction for a system is given by a single function over 3N dimensional space rather than N functions over 3-space: that is what makes for entanglement. But again, with a single particle the configuration space is just 3 space. That’s why the bouncing oil-drop model is OK for a single-particle phenomenon, like 2-slit interference, but can be misleading when you get to multiple-particle systems.

        Given the wavefunction for the entire system, one can analytically define a “conditional wavefunction” for subsystems. This is not a new object, but just a mathematical construct. There is an interesting story about these conditional wave functions in the theory (e.g. they undergo “collapse” even though the universal wave function never does), but that would be a bit complicated to go into.

        • Got it, thanks that makes a lot of sense.

          • Ok, thinking a little more about it, configuration space is N copies of 3D space right? So there’s one mathematical object, but at every point in space it has N tagged complex values. Each tag represents the effect the wave would have on the associated particle. Your “conditional wavefunction” is just slicing this wavefunction by tag ??

            • Tim Maudlin says:

              No, that’s not it. Suppose I have 4 particles in a 3-space. I can indicate the position of each particle with three real numbers, so I need 12 numbers to give the positions of all four, and each set of 12 numbers fixes the positions. So the configuration space is 12-dimensional. (Things are a little different if the particles are identical, but let’s ignore that now.) The wave function assigns a single complex number to each point in this space, not N complex numbers to each point in 3-space. This is what allows the wavefunction to carry information about correlations between the particles.

              If you have actual particles with actual positions at all times, then the evolution of their configuration corresponds to the motion of a single point in configuration space. So specifying the dynamics for the collection of particles amounts to specifying a velocity field on configurations space. If you have a function on a space, the obvious way to use it to define a vector field, like a velocity field, is to take some sort of gradient. That is exactly what the “guidance equation” in Bohmian mechanics does: you basically take the imaginary part of the gradient of this (single) complex function on configuration space to be the relevant velocity field: given any configuration, this determines how the configuration changes with time. So that determines how all the particles move. That’s the whole theory. (Spin is treated not as a property of the particles with a value, but by using a spinorial wave function in calculating the velocity vector field.)

              • I see, so it’s very similar in principle to the Lagrangian formulation of classical mechanics, where the whole state of N locations is a single point in 3N space, and the dynamics occur in that abstract space. If I understand it correctly that means the Schrodinger equation within the pilot wave theory is different from the Schrodinger equation within say Copenhagen interpretation, because the wave is defined on different spaces (3N dimensional vs 3 dimensional), and the gradients or laplacians are taken over those spaces.

                In the 3D/Copenhagen type interpretation, we have no knowledge of the particle’s path, only its final position which is a random variable predicted by |psi|^2. But if the particles “really do” move on a huge configuration space, it’s hardly surprising that projecting 3N dimensions down to 3D throws away a lot of information….

                Thanks for the background. I’ve always wanted to look into this area a little deeper, ever since intro QM classes back in… 2002 or something like that.

              • Tim Maudlin says:

                Yes, there is a similarity to a Hamiltonian formulation, but there is no difference between the quantum state in Bohmian mechanics and the quantum state in any other “interpretation”: they are all complex functions on configuration space, not on physical space. What is strange about the “Copenhagen” approach is that according to that theory, the “particles” do not always have definite positions, so there generally isn’t an actual precise configuration at all. In the Bohmian theory the “particles” are really particles, and always have positions, so there is always a definite configuration. It is the existence of a definite configuration that allows for the definition of a conditional wavefunction of a subsystem, so this definite is not available in a Copenhagen setting.

  11. Corey says:

    So here we have a certain state of affairs.

    AG wants to describe it as “classical probability does not apply to quantum systems.” TM (and BJ presumably) wants to describe it as

    “When one forgets the role of the apparatus, as the word ‘measurement’ makes all too likely, one despairs of ordinary logic—hence ‘quantum logic’. When one remembers the role of the apparatus, ordinary logic is just fine.” John Bell, Against ‘measurement’. Exactly the same can be said for ‘probability theory’.

    My question: is there any actual disagreement about the actual state of affairs, or is the disagreement only about the words used to describe it?

    (If the disagreement is only about the words, I have to say that TM’s phrasing seems distinctly less misleading to the uninitiated.)

    • Andrew says:


      As an applied statistician, I use the expression p(A) = sum_B p(A|B)p(B) all the time. And when we are taught probability, we’re taught to use this expression. However, it doesn’t work in quantum systems. In quantum systems the superposition occurs at the level of complex probability amplitudes, not the probabilities themselves. It seems to me that at the technical level, the discussants on this thread are disagreeing with me, because I keep mentioning complex probability amplitudes (which have phases, unlike classical probabilities) and the discussants don’t.

      Alternatively, you can do as Bill Jeffreys does above and condition on the measurement, thus p(A|E1) if measurement is taken using method 1, p(A|E2) if measurement is taken using method 2, etc. That would be fine, but it’s not what we generally do in applied statistics (or, for that matter, in probability textbooks). In the classical (non-quantum) world, when you observe data B, you condition on B, that’s it. In the quantum world, you either need to move to complex amplitudes, or you need to condition not just on B but on the fact that B was measured. By doing this latter step, you can keep all the probabilities working ok, but at the cost of requiring many more probability statements, and at the cost of no longer being able to simply condition on and average over latent variables.

      • Corey says:

        “Alternatively, you can do as Bill Jeffreys does above and condition on the measurement, thus p(A|E1) if measurement is taken using method 1, p(A|E2) if measurement is taken using method 2, etc. That would be fine, but it’s not what we generally do in applied statistics (or, for that matter, in probability textbooks).”

        I think some clarity and/or specificity is missing from the above claim. I could respond to the claim as stated by pointing out that if, say, E1 refers to a measuring device with Gaussian error of variance 1 and E2 refers to a measuring device with Gaussian error of variance 100, then we would indeed condition on that information when calculating marginal sampling distributions or posterior distributions. But this response seems to miss your point in some way that I can’t get clear in my head.

        • Andrew says:


          I agree that issues of measurement in classical statistics are not always trivial. Nonetheless, we are generally taught that the way to do conditioning is simply to consider the joint distribution and then count all the possibilities corresponding to the given data. With the exception of problems where the measurement error depends on the parameter of interest, or tricky examples such as the Monty Hall problem, we don’t usually think too much about measurement.

          But in quantum mechanics, the issue is not just that measurement error can vary. The issue is that we can’t simply condition on outcomes and average over probabilities. Instead we have to use complex probability amplitudes. There’s nothing like this in classical probability. Classical probabilities do not have phases, they don’t cancel out, they don’t exhibit the wave behavior associated with quantum probabilities.

          • Andrew, the quantum probability phases thing is one interpretation of QM, but Tim has pointed to another in which such things don’t seem to happen. It’s maybe sufficient to have classical uncertainty on the initial state of the particle, and then for any realization of the initial state, QM gives you a deterministic location for the particle to be observed which depends on a pilot wave that propagates according to the Schrodinger PDE.

            Now, the PDE for wave propagation can be connected to a different kind of probabilistic interpretation through the Feynman-Kac formulation of certain PDEs, and if you mish-mash the uncertainty in initial conditions + deterministic wave propagation together with Feynman-Kac propagation of the pilot wave you may be able to get a system where essentially you’ve said “we don’t know where the particle started, and we’re willing to throw away the deterministic path it took, so we might as well identify the physical particle with the virtual particle we’re using for the Feynman-Kac solution of the wave equation”.

            Then because the wave itself has phase, and you’ve thrown away the deterministic path of the real particle, it looks to you like the particle has phase too and because it’s all embedded in probability theory for the diffusion implied in the Schrodinger equation, the whole thing looks like we’re talking about probability for the actual physical particle, when in fact we’re talking about probabilities for the virtual mathematical particles associated with the diffusive propagation of the PDE.

            I admit to being pretty novice on the QM stuff, but I am somewhat knowledgeable about the classical connection between PDEs with diffusion and stochastic motion of virtual particles, and I could see how this all could get mish-mashed together in the minds of physicists, especially those who are more interested in physical outcomes of experiments than in foundational issues.

      • Bill Jefferys says:

        Andrew, you say:

        “Alternatively, you can do as Bill Jeffreys does above and condition on the measurement, thus p(A|E1) if measurement is taken using method 1, p(A|E2) if measurement is taken using method 2, etc. That would be fine, but it’s not what we generally do in applied statistics (or, for that matter, in probability textbooks). In the classical (non-quantum) world, when you observe data B, you condition on B, that’s it.”

        Do you really mean this? Do you mean to say that, whatever the data are, it doesn’t matter how the experiment was constructed or how the data were obtained? Do you mean to say that it doesn’t matter if the data are the result of an RCT or if they were gathered from clinical results?

        I don’t think you mean this.

        If you make a difference between data gathered from an RCT and data gathered from clinical results, then you ARE conditioning on background data that is NO DIFFERENT from a physicist who distinguishes between a mixed experiment where one or the other slit is opened with probability 0.5, or another experiment where both slits are always open.

        • Bill Jefferys says:

          Let me add: Just because you don’t explicitly put the conditions of the experiment into the “conditions” on the right hand side of the ‘|’ bar, doesn’t mean that you aren’t conditioning on something else. Just because “that is not what we generally do…” doesn’t mean that you shouldn’t do it.

          That is the point of Ed Jaynes’ remark that many of the “paradoxes” of probability theory arise from failure to condition on ALL the actual conditions involved. When you “do it” you may find that the results are not what you thought they were.

          • Corey says:

            I think a more charitable interpretation of what AG is claiming might be something like: our expectation is that things without brains or brain-analogues don’t display the Hawthorne effect, so it’s rather shocking that subatomic particles seem to.

            • It’s totally natural to me to think that subatomic particles interact with different apparati in different ways. Two slits is simply a different experiment than 1 slit. The particles interact with it differently. Changing behavior would be like saying the laws of physics change depending on whether I’ve got one slit open vs two. The fact is that the laws of physics stay the same, but they imply different behaviors whether you have one slit or two… I don’t see anything unintuitive about that.

              What’s unintuitive is “spooky action at a distance” at least if you believe in locality. I’m pretty happy to throw out locality personally. I think it makes much more sense than throwing out “realism” (ie. the idea that the electrons are definite things that take some definite path)

              • Also, it should be pointed out that in real experiments we are already conditioning on the electron/photon whatever hitting the detector. There are always some which will interact with the material forming the slits and be absorbed or otherwise modified in their path. In that sense, if you fire a single photon/electron into an apparatus it isn’t just going to light up your screen in one spot, it is sometimes going to do that, sometimes nothing will happen… In particular, if you do entangled experiments, sometimes you’ll detect one of the entangled particles and not the other. Sometimes you might detect one entangled particle and on unrelated particle. You’ll have to decide that the second particle wasn’t the entangled pair it was just experimental noise… so real statistics will look funny compared to the idealized ones of thought experiments. Of course the better your apparatus the better your stats will look I suppose.

              • Corey says:

                I think people are forgetting what’s really remarkable about the two-slit experiment. It isn’t that two-slit experiment gives a different result than the sum of the two one-slit experiments. It’s that if you do the two-slit experiment with a detector measuring which slit the particle passes through (hard for photons, easier for electrons), you do get the sum of the two one-slit experiments.

              • Tim Maudlin says:

                But that would only be remarkable if you thought the detectors worked by magic, not by any actual physical interaction. It is not astounding that they don’t, and can’t. And as soon as you model the detectors in any physically reasonable way (i.e. as establishing a correlation between the particle state and the detector state), then simple physical analysis shows that the interference bands ought to go away. Simple physical analysis with normal probability theory. See quote from Bell about forgetting the apparatus. Putting in a detector changes the physical situation. No surprise it has an effect. What the effect is depends on the physics.

              • My answer was going to look a lot like Tim’s answer. The detector changes the physical experiment too. A 2 slit experiment + detector has a different Hamiltonian etc than a 2 slit without detector.

              • Corey says:

                Tim and Daniel,

                I don’t disagree. I’m saying that *AG’s claim* would make intuitive sense for scientists trained in macroscopic realms where the process of establishing correlations between the observed object and the detector has a negligible effect on the subsequent behavior of the observed object.

              • Tim Maudlin says:

                Ah, yes. That’s a helpful comment. So the thought that one ought to (or has to) tinker with probability theory itself arises from not noticing this.

                And of course the really weird thing is not just that adding a detector at one slit changes the outcome, but that it does so even for the sub-ensemble where the detector doesn’t fire! That’s the key to the Elitzur-Vaidman bomb problem. But, as you say, even in this case the Hamiltonian changes. And this phenomenon has nothing to do with probability: the comment about the sub ensemble does not require any probabilistic concepts at all.

    • Tim Maudlin says:

      As far as I can tell, there is a substantive disagreement here. The original claim was that even the 2-slit experiment (which does not involve entanglement, or violations of Bell’s inequality, or anything) cannot be accounted for using classical probability theory. The Bell quote actually concerns the so-called “no hidden variables” “proofs” that go back to von Neumann, and those proofs do not even apply in the 2-slit case. There is just no question that the claim about the 2-slit data is incorrect, and also no question that the von Neumann “proof” cannot prove what people think it does: e.g. that no deterministic theory can recover the predictions of quantum theory. There are just straightforward counterexamples to such claims.

      Note that “forgetting the role of the apparatus” when trying to physically account for some observed data is just a plain error, not a viable option! The apparatus is there, as a physical object. It has to be taken account of when trying to account for the data—the apparatus operates by physics, not magic. Bell’s point is that if you make this mistake, you are going to be in an incoherent situation, and so despair of logic itself.

      • Andrew says:


        In the context of applied statistics, the question that Mike Betancourt and I posed in our article is whether something is to be gained by modeling macroscopic phenomena using complex probability amplitudes instead of just real-number probabilities.

        Regarding all the rest, I’d just like to thank you and the other discussants for commenting here. I much prefer direct back-and-forth to sniping on twitter etc.

        • Tim Maudlin says:


          Thanks for the comment. I really would like this to be helpful, and would be happy to try to go into as much detail as would be useful.

          Your last comment suggests something, but maybe this does not get at the main point. What you are calling a “complex probability amplitude” is just what I would call a wavefunction or quantum state, and of course it plays a central role in the explanation of any phenomena using quantum theory. It is just that on some understandings, this item has nothing very directly to do with probability. This is perhaps most clear in a pilot-wave picture, where the direct role of the object in the theory is to provide a (deterministic!) dynamics for the particles. The probabilistic aspects of the theory arise then in the normal way: a probability distribution over possible initial states consistent with the experimental situation is carried by the dynamics over into a probability distribution over outcomes. All of that is just using standard probability theory. And the probabilities for the outcomes are exactly what are given by the standard quantum mechanical predictive algorithms. So no one is denying the central importance of a complex function in the theory, just the relation of that function to how probabilities are handled.

    • konrad says:

      Not sure if the question makes sense. The discussion (and all of modelling, more generally) is about how to describe reality – i.e which words and equations should be used to describe it.

      • Corey says:

        konrad, I generally try to ask that genre of question when it appears people are talking past one another. I do my best to make the question make sense, but I offer no guarantees.

        • konrad says:

          Well it is abundantly clear that people are talking past each other in this thread, and I didn’t mean to imply that it was a bad question. My point was that, yes, the disagreement _is_ just about how to describe reality, but that this constitutes a substantive disagreement (because in modelling we really care about how to describe reality).

          Central to the disagreement (I suspect) is the usual point of difference, namely whether probabilities are defined as descriptors of information states (e.g. probability theory as an extension of logic) or of empirical reality. Andrew has previously stated that he supports the latter rather than the former interpretation. It feels like it _might_ be possible to argue that the latter interpretation could be extended to refer directly to the state of a system, but I don’t see how such an argument could be made. Certainly it would be a radical reinterpretation even of the frequentist (i.e. empirical) notion of what the word “probability” actually means.

  12. David Lovis-McMahon says:

    Having not seen the talk, I was under the impression that the speaker (Robins) was leveraging Bell’s theorem precisely because it puts two things on the line for any quantum mechanical explanation of the world: localism and realism. One or the other (or perhaps both) has to be incorrect. The speaker clearly assumes that localism is true, “Assuming with Einstein that faster than light (supraluminal) communication is not possible…”. By virtue of that assumption, realism must be false.

    As I understand realism, it is the requirement that reality have an “object permanence” such that the moon will still be there in the sky even when I’m not looking at it. Realism implies that while we cannot simultaneously observe what my fever would be had I [taken vs. not taken] the aspirin, we can speak meaningfully about the result under the different choices made.* More formally this means that the counterfactual “took Aspirin” is included with the factual “didn’t take Aspirin” in the statistical population of possible outcomes describing my fever. Finally, using the laws of probability, I could condition my way from the factual to make inferences about the counterfactual.

    Rejecting realism is a pretty profound thing to do given that the entire mechanism of Neyman-Rubin causal modeling rests on this idea that we can speak meaningfully about the result under the counterfactual. The correctness of this seems to be contingent on assuming the objectivity of measurement and the corresponding counterfactuals.

    Do I understand this correctly and are there going to be any articles by Robins, et al., coming out soon on the topic?

    *I should note that I am making the claim that realism implies something about how the world works and not how or whether we can know how the world works.

    • Roger says:

      Rejecting realism would be pretty profound if it really proved that the moon is not there unless we look at it. They would have given a Nobel prize to anyone who could prove that.

  13. Corey says:

    I’m resurrecting this thread to link a paper which shows in excruciating detail exactly how AG and Mike Betancourt get this one wrong.

    • Andrew says:


      You perhaps won’t be surprised to hear that I am not at all convinced that we are wrong. I can leave it to Mike to weigh in himself, but, very briefly, let me say that, yes, I think that classical (i.e., Bayesian, Kolmogorovian, etc) probability theory can be used to model quantum-mechanical outcomes such as the 2-slit experiment, but at the cost of a level of complexity that would generally be considered unacceptable in applied statistics. That is the point of our paper: conditioning all models on the sequence of measurements is something that could be done, but is generally not done in probability modeling. (It’s similar to the problem of utilities in economics: yes, you could model utilities as functions of the sequence of steps that are taken to reach the outcome, but this is contrary to the spirit of the Neumann theory, in which utilities are taken as a function of outcomes alone.) A probability model that increases exponentially in complexity with each additional measurement is not a probability model in the usual sense.

      • Corey says:

        Andrew: “conditioning all models on the sequence of measurements” is neither here nor there. If you actually want to know how you’ve gone wrong, read the paper.

        • Andrew says:

          “Conditioning on the measurements” is definitely the issue in the 2-slit experiment! Also there’s other issue of Fermi-Dirac and Bose-Einstein statistics, which don’t follow the rules of classical (Boltzmann) statistics. Again, it should be possible to shoehorn this all back into classical probability but at the cost of an awkward increased complexity in the model.

    • Firstly, the paper is not new — a version has been on the arXiv for years, In fact, I was at one of the conferences where Philip first began presenting on the work. I know both Philip and Kevin, I respect them, and I’ve always liked their idea. But, as they note, in order to reconcile classical probability with quantum mechanics some physical assumptions have to be changed, and those changes have nontrivial consequences that put them at odds with mainstream physics community.

      As I stated numerous times above, there are many approaches to understanding quantum “weirdness”. The most common (by a large margin) preserve causality and locality at the expense of classical probability theory (there are important reasons for the popularity of this approach, in particular the ability to scale to quantum field theories, but that’s not relevant to this discussion), but there are many others. Some focus on maintaining classical probability theory while sacrificing locality, some sacrifice causality. At this point there are no experiments that can separate these theories experimentally, so it’s an ontological argument. In other words, one without an answer.*

      Also as has been stated numerous times, our proposal of the possible utility of generalized probability theories is completely independent of their being relevant to any “true” model of physics. Regardless of their physical nature, their different properties might have use in modeling systems that don’t fall into the domain of classical probability (ill-defined state spaces, misbehaving measures, etc).

      * Not that it’s not fun to argue those theories. My fellow physicists and I would often discuss these matters, often over beer and never too seriously. Never trust someone who takes quantum mechanics too seriously.**

      ** I’m already waiting for people to take this comment too seriously…

      • Corey says:

        “in order to reconcile classical probability with quantum mechanics some physical assumptions have to be changed, and those changes have nontrivial consequences that put them at odds with mainstream physics community.”

        Recall that the claim in the original post is “classical probability theory… does not fit quantum reality”. As you note, it is *physical* assumptions that are at the source of the apparent disagreement — and that’s all your various interlocutors were ever claiming in the original discussion!

        As to the opinion of the mainstream physics community, well, you would know more about that than me, so I’ll take your word for it. But it does seem to me that the only physical assumption that needs to be changed to make classical probability *reconcilable* with quantum mechanics is the assumption that there is a fact of the matter about which slit the particle passes through when the experiment doesn’t measure it. I’d be surprised if the opinion of the mainstream physics community is that there really is such a fact.

  14. Entsophy says:

    […] was reminded of this old post by Andrew Gelman about whether Quantum Mechanics requires a change in the axioms of probability […]