Judea Pearl on why he is “only a half-Bayesian”

In an article published in 2001, Pearl wrote:

I [Pearl] turned Bayesian in 1971, as soon as I began reading Savage’s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural and useful to cast what we know in the language of probabilities, and (iii) If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases.

Thirty years later, I [Pearl] am still a devout Bayesian in the sense of (i), but I now doubt the wisdom of (ii) and I know that, in general, (iii) is false.

He elaborates:

The bulk of human knowledge is organized around causal, not probabilistic relationships, and the grammar of probability calculus is insufficient for capturing those relationships. Specifically, the building blocks of our scientific and everyday knowledge are elementary facts such as “mud does not cause rain” and “symptoms do not cause disease” and those facts, strangely enough, cannot be expressed in the vocabulary of probability calculus. It is for this reason that I consider myself only a half-Bayesian.

Interesting. The Neyman-Rubin framework of potential outcomes does allow for casual reasoning within a probabilistic structure, but indeed it does not allow for statements such as “mud does not cause rain.” In the potential outcomes notation, one could define a random variable y=1 for rain or 0 for no rain, and define y^1 to be the outcome under treatment and y^2 to be the outcome under control. But it would not make sense for “mud” to be a treatment: in the potential-outcomes framework, a treatment is something that you do, not something such as “mud” that you observe.

I’m not saying here that Pearl’s framework is a good or bad idea; my point here is that I’m agreeing that he indeed seems to be asking questions that cannot be addressed by probability models.

Some of my earlier discussions with Pearl are here.

1. “Faced with an apple and an orange, I would consider myself half-applean.”

2. Andrew,

Do you think the only reason the potential outcomes model cannot represent facts like “mud does not cause rain” is that “mud” isn’t the sort of thing that could be a treatment variable? Because that seems like a limitation in the semantics, not a formal limitation. Why not, for instance, introduce a variable M that takes 1 if there is mud (in whatever region we care about) and 0 otherwise? Then we imagine our treatment as something like making mud or not. Why won’t that work?

I wonder if Pearl would even grant that potential outcome variables can be defined in pure probability theory. Don’t they have counterfactual content, which Pearl will attribute to causality? I mean, I think Pearl gets everything he wants by adding the do() operator to the probability calculus. But then, he thinks that the do() operator has causal content that just isn’t in the probability theory to begin with.

Am I missing something important here?

• Andrew says:

Jonathan:

Your example (“introduce a variable M that takes 1 if there is mud (in whatever region we care about) and 0 otherwise”) is observational, not causal (in the sense of an intervention). In the Rubin framework, the causal content is introduced by the idea of an intervention, or a potential intervention. Otherwise it’s just correlation and not a potential “experiment” (in statistics jargon).

• “In the Rubin framework, the causal content is introduced by the idea of an intervention, or a potential intervention.”

I thought that was my point. But I was already worried that I was missing something. So, probably I am. In case you’re willing to help me out of my ignorance …

The way I understood it, as far as the formalism goes, the potential outcomes model can represent anything that the do() calculus can represent. (I think Pearl has actually proven as much. At least, he claims to have in some of your earlier exchanges.) In the potential outcomes framework, one usually only takes genuinely manipulable variables as possible treatments, though. So, Smoking and Statistics_Training are okay because we could assign some people to smoke or get stats training, but Sex [male/female] and the Universal_Gravitational_Constant are not because we can’t assign people to be male (well, okay, maybe with surgery and hormone replacement therapy …) or fix G to different values. Sex and G would count as attributes in Holland’s sense, I think. At best “interventions” on attributes would be imaginary. At worst, they wouldn’t even be conceivable. I think that’s a decent semantic/statistical point, but I wonder if Pearl is saying something stronger — at the level of the formalism itself?

Again, we judge (perhaps rightly) that some variables are not suitable to be treatment variables. But the formalism doesn’t tell us which ones are good and which ones aren’t. So, if the formalism can represent causation for some variables, it can represent it for all of them. The representation might not be useful for much, but that’s a different discussion. That is, let F_G=a(u) and F_G=b(u) denote the gravitational force with respect to unit u under settings of G at values a and b, with a not equal to b. The variables are perfectly well-formed (syntactically), but they don’t seem to be meaningful (semantically), since we cannot carry out the relevant experiment. Does that sound right?

3. MAYO says:

I wonder what % Pearl is now? The points (i)-(iii) just support my suspicion that it is only or largely (i) that leads some to be bayesian, and yet it’s absurd to suppose that frequentists jettison background knowledge. That doesn’t mean she will want to drag in everything just to run a little test. We often want to jump in, jump out, deliberately to see what happens, knowing it will be put together with other information later on.

• Andrew says:

Mayo:

Yes, in many ways Bayesian approaches are a formalization of the jump-in, jump-out use of prior information that you describe. Like any formalization, Bayes can help in some settings and hurt in others. Following Jaynes, I like Bayes partly because when it does not give a reasonable answer, I can often go back and examine what ways the assumptions underlying the model were violated.

4. MAYO says:

Well, no the jump in and jump out procedure was intended to describe the frequentist (who doesn’t carry so much luggage each time because she’s not trying to accumulate via Bayesian updating). Anyway, I think that frequentist or error statistical testing is the way to go for going back to examine the ways assumptions underlying the model are violated for the simple reason that they enable distinguishing the effects of different misspecifications and errors. I don’t see how the Bayesian solves her Duhemian problems of pinpointing the blame for the”unreasonable answer”. If I’m understanding you, upon detecting something is wrong, one seeks a better prior or model. But there are lots of ways to “fix” things up, aren’t there? The unreasonable answer can be due to different things, in other words, and might be masking different misspecifications. Perhaps you have a way to deal with this. Anyway, as it happens, I plan to discuss some of the ways that a battery of error statistical misspecification tests can function (based on Aris Spanos’ account); hopefully some connections will be unearthed.

• Corey says:

DGM, I think that when you use the word “Bayesian” you mean something different from AG. I believe you mean strict philosophical subjective Bayesianism of the kind that Savage preached and Kadane practices. When AG writes of “Bayesian approaches” I believe he means a combination of exploratory data analysis, computation of posterior distributions, and model checking and elaboration. Andrew has argued that all three components fall within the boundaries of “Bayesian data analysis”.

I think the two of you will only be talking past each other until this name collision is recognized.

5. Elias Bareinboim says:

BTW,
this discussion made me remember a recent post by Nielsen’s in his blog, he wrote about Pearl’s theory. It’s very lucid and clear, it’s perhaps the best intro that I read on Pearl’s more basic results, might be worth reading:

http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/

6. Evens Salies says:

I have never used the Bayesian approach in my applied economics work. Should I use it in the future, I’ll try not to forget the answer below from late Clive Granger to the question raised by Peter Phillips “… do you see some advantages in the Bayesian paradigms … ?”:

” … I am not a Bayesian because I lack self-confidence [...]“

” … a good Bayesian, that is, a Bayesian who picks a prior that has some value to it, is better than a non-Bayesian. And a bad Bayesian who has a prior that is wrong is worse than a non-Bayesian, and I have seen examples of both. What I do not know is how do I know which is which before we evaluate the outcome”

ET, 13, 1997, pp. 253–303. http://korora.econ.yale.edu/phillips/pubs/art/i006.pdf