Judea Pearl writes:
Can you post the announcement below on your blog? And, by all means, if you find heresy in my interview with Ron Wasserstein, feel free to criticize it with your readers.
I responded that I’m not religious, so he’ll have to look for someone else if he’s looking for findings of heresy. I did, however, want to share his announcement:
The American Statistical Association has announced a new Prize, “Causality in Statistics Education”, aimed to encourage the teaching of basic causal inference in introductory statistics courses.
The motivations for the prize are discussed in an interview I [Pearl] gave to Ron Wasserstein. I hope readers of this list will participate, either by innovating new tools for teaching causation or by nominating candidates who deserve the prize.
And speaking about education, Bryant and I [Pearl] have revised our survey of econometrics textbooks, and would love to hear your suggestions on how to restore causal inference to econometrics education. [I’m confused on that last point; I thought that causality was central to econometrics; see, for example, Angrist and Pischke’s book. — AG]
Is there any evidence that scientists who specifically study “causality” are better scientists than those without such formal education?
You can ask the same question about “Philosophy of Science”. Is there any evidence that scientists who specifically study the Philosophy of Science are better scientists than those without such formal education? Answer: NO.
Have any studies been conducted?
Actually, I’d take even circumstantial evidence. For example, Laplace was doing Bayesian statistics to determine which small measured aberrations in astronomy couldn’t reasonable be explained by measurement errors, and then he applied Classical Mechanics to investigate any “significant” anomalies. How exactly would Laplace have benefited from any formal training in casual analysis? It seems like his “casual inference” was pretty much perfect without any training at all, which is exactly the same impression I get from every Physicists I’ve ever met that wasn’t exposed to Frequentist Statistics.
People who have been exposed to Frequentist ideas on the other hand are full of the following intuitions:
-Instead of thinking of the data as real and a probability distribution as a made up construct, they think of the probability distribution as real and the data as some kind of phantom outcome out of an amorphous universe of possible outcomes.
-They imagine an interval estimate for a parameter is something that is going to contain the right answer a fixed percentage of time in experiments that will never be performed.
-They imagine infinite prepositions of experiments that couldn’t possible be repeated even in principle,
-They imagine multiple repetitions of our universe in order to be able to think about certain probability distributions.
-They imagine they’re examining “data generation mechanisms” even though there is no there is no Frequentist analysis imaginable which would have lead them Euler’s equations of rigid body motion by conducting statistical analysis of coin flips.
All this focus on irrelevancies seems to be pretty much destroy everyone’s intuition about real physical systems. Of course, some statisticians are so brilliant they can overcome these shortcomings and still do real work. For everyone else though, their intuition seems to be permanently damage by this nonsense. The need for “casual inference” seems to be a solution to an artificial problem great by Frequentist statistics, which to this day is the first look that almost every student sees of statistics.
Entsophy, this is a better list of reasons to be uncomfortable with the frequentist paradigm for statistics than what I have managed to come up with. Thank you.
And ignore the troll (below), As a 2+ year reader of this blog, let me opine that comments from folks such as your good self, K? O’Rourke and Bill Jeffreys (not an exhaustive list) add value to this already excellent blog.
I like Pearl and he is right to push for change.
Change is generational so I see the focus on education.
The current establishment is never going to change.
PS Angrist and Pischkes book is not essentially about causality. You don’t need probability or regression or counterfactuals to teach causality: Mill’s methods and DAGs will do for identification and estimation. Probability only comes in to summarize uncertainty.
Very well put.
I’ve managed a significant number high speed/low drag quantitative efforts. The typical analyst involved had either a good BS degree or a week advanced degree in a quantitative field. Most of this crowd had above average, but below genius, intelligence. Academically, I’d say they’re about equivalent to an average Ph.D. in Sociology or Psychology.
Within that crowd, there were two groups: those that had some statistical straining and those that didn’t. What I noticed is that those who had no statistical training never had a problem figuring out causal relationships. Nor was their lack of exposure to basic statistical methods, like hypothesis testing, p-values, or linear regression, ever a problem. They always seemed to find some clever way of looking at the data without statistics that brought out the essential evidence and which was almost always far more convincing.
The only problems I ever had was with those who had a basic statistics education. They constantly made unwarranted casual leaps in their analysis. So my recommendation for improving casual inference is to stop teaching the introductory hypothesis test/p-value blah blah blah. There are after all alternatives. If those alternatives are too difficult to teach in a cookie cutter fashion, thereby leaving students with either no statistical training, or very good statistical training, then so much the better.
You sound like a crank with an ax to grind.
Please find another blog, or your own blog, to voice your anecdotes and tribal posturing. You are increasing the noise within the comments section.
OMG, you’re right they are anecdotes; I didn’t even calculate a p-value, directed acyclic graph or anything. I repent and apologize to the other tribes. I can see now that I’ve been thinking about causality all wrong, just like those other ignorant rubes and cranks (Galileo, Newton, Euler, Gauss, Cauchy, Gibbs, Maxwell, Einstein, Schrodinger).
One day, “casual inference” won’t just be the plaything of a select few Super Scientists (disciples of Pearl or Rubin), who alone have to make all the breakthroughs, but will be part of everyone’s education. When that happens we’ll see an explosion of scientific understanding that will make the Enlightenment look like finger painting.
You make perfect sense to me.
The classical introductory statistics curricula is fundamentally demented. It’s taught as a series of cookbook algorithms that one applies, who knows why. There is little discussion of any justification for the techniques as properly performing inference based on data, mainly because there is little justification. The techniques just don’t hold water at their foundations.
Back in grad school (EE, machine learning), I took the introductory graduates statistics sequence. Didn’t make a lick of sense, and the techniques tended to obscure straightforward solutions. I found Pearl and Jaynes, everything instantly made sense, and the approach would instantly clarify otherwise complex problems.
An economist (econometrician) friend of mine often corresponds with Prof. Pearl, and what I understand is that Pearl believes the econometrics approach to causality is deeply, fundamentally wrong. (And econometricians tend to think Pearl’s approach is fundamentally wrong.)
It sounds to me like Pearl was being purposefully snarky.
Yes, the problem with the econometrics approach is that it lumps together identification, estimation, and probability, so papers look like a Xmas tree.
It all starts with chapter 1 in econometrics textbooks and all those assumptions about the disturbance, linearity, etc…
Yet most discussions in causality oriented papers revolve around identification and for that you can mostly leave out functional forms, estimation, and probability.
Why carry around reams of parametric notation when it ain’t needed? One wonders how Galieo, Newton, or Franklin ever discovered anything without X’X^(-1)X’Y?
Jack, I think you misunderstood what you friend told you.
If you read any of my papers or books you will come to realize immediately
that I believe the econometrics approach to causality is deeply an fundamentally
right (I repeat: RIGHT, not WRONG). Although there have been two attempts to
distort this approach by influx of researchers from adjacent fields — see my
reply to Andrew on this page, or read http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf
Next, I think you are wrong in stating that “econometricians tend to think Pearl’s approach is fundamentally wrong”. First,I do not offer anyone “an approach”, I offer mathematical tools to do
what researchers claim they want to do, only with less effort and greater clarity, which researchers may
choose to use or ignore. The invention of the microscope was not a “new approach” but a new tool.
Second, I do not know a single econometrician who tried my microscope and thought it is “fundamentally
wrong”; the dismissals I hear come invariably from those who refuse to look at the microscope for religious reasons.
Finally, since you went through the trouble of interpreting hearsay and labeling me “purposefully snarky”, I think you owe readers of this blog ONE concrete example where I criticize an economist for reasons that you judge to be unjustified. You be the judge.
Reply to Andrew:
Causality is indeed central to econometrics.
Our survey of econometric textbooks
http://ftp.cs.ucla.edu/pub/stat_ser/r395.pdf
is critical of econometric education today, not of econometric
methodology proper.
Econometric models, from the time of Haavelmo (1943), have
been and remained causal
(see http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf)
despite two attempted hijacking, first by
regressionists, and second by “quasi-experimentalists,”
like Angrist and Paschke (AP). The six textbooks we reviewed
reflect a painful recovery from the regressionist assault which more
or less disappeared from serious econometric research, but is
still obfuscating authors of econometric textbooks.
As to the debate between
the “structuralists” and “experimentalists,” I address it
in Section 4 of this article:
(see http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf)
Your review of Angrist and Paschke book “Mostly Harmless
Economometrics” leaves out what in my opinion is the major drawback
of their methodology: sole reliance of instrumental variables
and failure to express and justify
the assumptions that underlie the choice of instruments.
Since the choice of instruments rests on the same type of
assumptions (ie.,exclusion and exogeneity) that Angrist and
Paschke are determined to avoid (for being “unreliable,” ) readers
are left with no discussion of what assumptions do go into
the choice of instruments, how they are encoded in a
model, what scientific knowledge can be used to defend
them, and whether the assumptions have any testable
implications.
You point out that Angrist and Pischke completely avoid the task of
model-building; I agree. But I attribute this avoidance,
not to lack of good intentions but to lacking mathematical
tools necessary for model-building. Angrist and Pischke
have deprived themselves of using such tools by making an
exclusive commitment to the potential outcome language,
while shunning the language of nonparametric structural models.
This is something only he/she can appreciate who attempted
to solve a problem, from start to end, in both languages,
side by side. No philosophy, ideology, or hours
of blog discussion can replace the insight one can gain by such an exercise.
This is a horribly incomplete characterization of Angrist & Pischke’s textbook. The discussion of instrumental variables is quite nuanced and represents but one topic in a much broader discussion of identifying and estimating causal effects. Sure there are gaps and some material is already outmoded, but it provides an outstanding foundation in my opinion. In their identification results, I can’t imagine there could be contradictions with what would obtain using your NPSEM approach—in fact if you look at their characterization of dose response functions I am inclined to say they have already subsumed most of what your text provides and done one better by marrying it with workable and robust approach to estimation.
Cyrus,
The purpose of my post was not to provide a complete
“characterization of Angrist and Pischke’s textbook.”
Its stated purpose was to point out “what in my opinion is the major
drawback of their methodology.” Among other drawbacks, I
listed: (1) failure to encode the IV assumptions in the model
(2) failure to reason about them,
and (3) failure to discuss whether these assumptions have
testable implications.
Of course there can be no contradiction between
the method of Angrist and Pischke and the one
based on nonparametric structural equations (NPSEM);
the former is what remains from the latter after
a few mathematical tools are forbidden. By analogy, arithmetics
that forbids multiplication would never contradict
ordinary arithmetics that embraces both multiplication
and addition.
If you think that Angrist and Pischke’s book provides an
outstanding foundation for identification I would
challenge you to assess how many of their
students can solve the toy problems presented
in Section 3.2 of this article:
http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf
especially those pertaining to instrumental variables
(section 3.2.4). Note that these problems are not
contrived to prove my point; these are the most elementary and recurring
problems in the analysis of IV’s, e.g., Is there an instrumental
variable in our model? What would the IV estimand be?
You cannot get more elementary than that.
I would be curious to know your assessment.
I feel pretty secure in assuming that an AP student would apply the tools of conditional probability and counterfactual reasoning as needed and required to answer those questions. There’s nothing exotic about what one learns from AP that would prevent one from doing so (and there is nothing that restricts relative to NPSEM in a manner that resembles the silly reference of an “arithmetics that forbids multiplication”). Nonetheless I can contribute to taking up your challenge by assigning the question to an actual class of mine (who are trained using AP) if you accept to find a way to assign to a comparable class the same plus something along the lines of the LATE result, say, with premises articulated in potential outcomes (the latter are already assigned to mine). Heck, there’s no reason for us to accept this single idiosyncratic test: we could do this on a larger scale with reasonable rigor were there buy-in by relevant faculty. All that is needed then is an agreed upon set of canonical causal problems.
Cyrus,
We have a deal!
I like your proposal to create a large scale database of
canonical causal problems that the causal inference community
agrees represents what students need to know in this area.
(BTW, have a look at the criteria for submitting nominations
for the causality education prize, and check if it does not
meet your expectations)
I am glad you are already assigning my toy problems to your
class, and I accept your condition in the bargain.
(“to assign to a comparable class the same plus something along
the lines of the LATE result, say, with premises articulated
in potential outcomes”). This would probably be easier for me,
because my students are equally conversant in both languages
and, as a matter of fact, the LATE theorem has
been assigned as a homework in my causal inference class
in the past 15 years. (See
http://bayes.cs.ucla.edu/BOOK-2K/viewgraphs.html
Week 7, Homework 3).
Two remarks before we embark on this exciting experiment.
You say: “There is nothing exotic about what one learns
from AP that would prevent one from doing so [ie, apply
probability and counterfactuals to solve the problems]”.
I agree; the obstacles surface not in what AP teach but
in what they do not teach, namely, two indispensible tools
of causal inference: 1. How to read counterfactuals and
ignorability conditions in a given NPSEM model and, (2) how
identify the testable implications of a given NPSEM.
And, as I wrote recently, the neglect is not accidental
but cultural.
“.. the PO framework has also spawned an ideological
movement that resists this symbiosis and discourages its
faithfuls from using SCM or its graphical representation.
This ideological movement (which I call
“arrow-phobic”) can be recognized by a total avoidance
of causal diagrams or structural equations in research
papers, and an exclusive use of “ignorability” type
notation for expressing the assumptions that (must) underlie
causal inference studies. For example, causal diagrams are
meticulously excluded from the writings of Rubin, Holland,
Rosenbaum, Angrist, Imbens, and their students who, by and
large, are totally unaware of the inferential and
representational powers of diagrams.
(See http://www.mii.ucla.edu/causality/?p=554 for full text
of my position on the PO and SCM frameworks)
Lastly, if we are going to collaborate,
I must ask you to refrain from using disrespectful
adjectives such as “silly” (as in your “.. in a manner that
resembles the silly reference of an arithmetics that forbids
multiplication”). I do not use analogies
lightly. And the analogy to arithmetics was
chosen carefully, to represent the cultural prohibition
that the PO camp imposes on its faithfuls. Quoting
again from my blog piece, I wrote:
———————–
“The arrow-phobic exclusion can be compared to a prohibition
against the use of “multiplication” in arithmetics.
Formally, it is harmless, because one can always replace
multiplication with addition (e.g., adding a number to
itself n times). Yet practically, those who shun
multiplication will not get very far in science.
The rejection of graphs and structural models leaves
investigators with no process-model guidance and, not
surprisingly, it has resulted in a number of blunders which
the PO community is not very proud of.
————————
Do we have a deal?
Judea
Judea (if I may),
I am replying above you as we seem to have exhausted the nested “reply-to” levels available.
Here’s how I am coming to see the experiment : establish a set of canonical causal problems, and let students’ attempts to solve them shed light on the relative merits of potential outcomes vs graphical or NPSEM analytical tools for different types of problems. It will be good to have this set of problems for pedagogical purposes. Others can benefit from it too.
I expect we will find that there are comparative advantages and disadvantages in each. Whether one can fully integrate the other is a question though.
In my own work, I switch freely between analytical approaches, appreciating the comparative advantages.
It seems you do too: I note for example that your assignment related to LATE (to which you link below) has students first recast the IV problem in terms of potential outcomes and then discover the LATE result. This is about as clear a case as one might hope of a shift of analytical frameworks allowing one to uncover new and profound insights previously hidden from view. I hope you acknowledge this, and the broader class of principle stratum results, as being a major accomplishment for those working with the potential outcomes analytical framework. And this is an even less fundamental accomplishment than what those working with potential outcomes have done to provide a coherent foundation for robust estimation and inference (after all identification is just the very start of the process).
Having had the chance to have this more elaborate exchange (and I am grateful for your participation and humor, even despite using phrases like “silly”!), my more refined take on your critique of AP is that they do too little to help students understand from where identification might come beyond randomized experiments or striking natural experiments. I am not sure this is disservice or oversight, but quite possibly a very mindful neglect.
Cyrus,
I am glad you propose to start with a list of canonical problems, and let students
choose whatever combination of techniques they deem useful to get them solved.
I will let you take the first shot, because my definition of a “problem” may not
be the same as yours — for me, a problem must start with a story that everyone understands.
My book is full with those, but I know that “stories”, in some very respectable circles,
are mocked as “toy-like” and are immediately replaced with numerical tables of statistical data.
So, I am anxious to see an example of a “problem definition”.
As to your comments on the drawbacks and achievements of the PO framework,
I suspect you did not read the end of my blog post, where I mention three
embarrassing blunders that PO researchers fell into, having to operate in the
darkness of the “missing data” black box. I will copy that portion below. Note
that I count the “principal strata framework” (not the concept) as one of those blunders,
and I explain why. Here it is:
—————————start of quote ————
The rejection of graphs and structural models leaves investigators with no process-model guidance and, not surprisingly, it has resulted in a number of blunders which the PO community is not very proud of.
One such blunder is Rosenbaum (2002) and Rubin’s (2007) declaration that “there is no reason to avoid adjustment for a variable describing subjects before treatment”
http://www.cs.ucla.edu/~kaoru/r348.pdf
Another is Hirano and Imbens’ (2001) method of covariate selection, which prefers bias-amplifying variables in the propensity score.
http://ftp.cs.ucla.edu/pub/stat_ser/r356.pdf
The third is the use of ‘principal stratification’ to assess direct and indirect effects in mediation problems. which leads to paradoxical and unintended results.
http://ftp.cs.ucla.edu/pub/stat_ser/r382.pdf
In summary, the PO framework offers a useful analytical tool (i.e.. an algebra of counterfactuals) when used in the context of a symbiotic SCM analysis. It may be harmful however when used as an exclusive and restrictive subculture that discourages the use of process-based tools and insights.
Additional background and technical details on the PO vs. SCM tradeoffs can be found in Section 4 of a tutorial paper (Statistics Surveys)
http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf
and in a book chapter on the Eight Myths of SEM:
http://ftp.cs.ucla.edu/pub/stat_ser/r393.pdf
Readers might also find it instructive to compare how the two paradigms frame and solve a specific problem from start to end. This comparison is given in Causality (Pearl 2009) pages 81-88, 232-234.
————————-end of quote ——————-
Please note the last remark, which leads you to an example of a “causal problem” solved
in the two frameworks, starting with a “story” and ending with an estimate.
I think it is the only such example in the literature, but you may surprise me.
I like your “mindful neglect” excuse for PO’s blunders.
I would not be so forgiving. My 20 years experience with many
of its researchers leads me to a different characterization: “mindful resistance.”
by which I mean mindful resistance to invest the 4 minutes it takes to learn
the multiplication table. (And I choose my analogies carefully).
Looking forward to your first causal example.
Reply to all discussants,
I hear many voices who agree that statistics education needs a shot
of relevancy, and that causality is one area where statistics education has
stifled intuition and creativity.
I therefore encourage you to submit nominations for the causality in statistics
prize, as described in http://www.amstat.org/education/causalityprize/
and http://magazine.amstat.org/blog/2012/11/01/pearl/
Please note that the criteria for the prize do not require fancy formal methods;
they are problem-solving oriented. The aim is to build on the natural intuition that students
bring with them, and leverage it with elementary mathematical tools so that they
can solve simple problems with comfort and confidence (not like their professors).
The only skills they need to acquire is: (1) articulate the question, (2) specify
the assumptions needed to answer it (3) determine if the assumptions have testable
implications.
The reasons we cannot totally dispose of mathematical tools are: (1) scientists have local
intuitions about different parts of a problem and only mathematics can put them all together
coherently, (2) eventually, these intuitions will need to be combined with data to come up
with assessments of strengths and magnitutes (e.g., of effects). We do not know how to
combine data with intuition in any other way, except through mathematics.
Recall, Pythagoras theorem served to amplify, not stifle intuitions of ancient geometers.
This post is related and something I, as someone whose work sometimes involves statistics and causality, would be interested to hear Andrew and others respond too. Is this a legitimate or making a fuss about nothing?
http://wmbriggs.com/blog/?p=6804
Chrisare,
Thanks for bringing this post to my attention. No,
the post is not just making fuss about nothing; it reflects
the prevailing thinking among many mainstream analysts,
(perhaps not represented on this blog).
William Briggs, the blog master, says that
“The equation Y = beta x + epsilon is WRONG,”
“and in a sad way, too.”
Whereas Paul Holland wrote in 1995:
“The only meaning I have ever determined for such an
equation is that it is a shorthand way of describing the
conditional distribution of Y given X.”
Briggs goes further and states that the equation is plainly
WRONG, and that the only correct way of writing what the equation
means is to specify the full-blown bi-variate distribution
of X and Y.
It would probably come as a shock to Briggs, Holland and
other analysts to know that, since Haavelmo (1943), economists
have taken the structural equation Y = beta x + epsilon
to mean something totally different, and
that it has nothing to do with the distribution of X and Y.
And I literally mean NOTHING; structural equations are distinct
mathematical objects that convey totally different information
about the population and, in general, they do not even constrain
the regression equation describing the same population.
Well, you said you would be interested to hear Andrew and
others respond — I join you in interest.
Andrew (and others), can you contribute a thought or two?
I am curious to know if Haavelmo’s distinction
is common knowledge, or comes as a surprise to readers
of this blog.
Judea:
I don’t usually get much out of those old-style theoretical papers but I know that some people (including you and Rubin, each in your own way) do, and I respect the search for intellectual antecedents to current work. As I recall, a key difference between the regression notation used in statistics and econometrics is that statisticians tend to model the data while econometricians model the underlying phenomenon. Thus, for example, in a simple regression model the economist will talk about the assumption that the error is independent of the predictors, whereas statisticians think of that as part of the model specification and not a substantive testable assumption. In my opinion, many of these notational tangles become more understandable with multilevel models, because with multilevel modeling you’re not simply giving a distribution to data, you’re modeling underlying parameters. This brings the statistical approach closer to the economics approach in which latent variables are often in mind.
P.S. As a statistical educator, I appreciate your generosity in endowing this prize.
(Trying to reply, but the system says: duplicate)
Andrew,
You hit it right on the nail: “statisticians tend to model the data
while econometricians model the underlying phenomenon.”
But this cleavage is far from being a topic of “old-style
theoretical papers” or “intellectual antecedents to current
work”; it is a major impediment to current work.
Given this cleavage, we can understand the bewilderment
of economists (like Heckman and Leamer) who read statistical
papers and say: This is nonsense, all they do is modeling
the data”. It is also easy to understand the bewilderment
of statistics-trained analysts (like Holland and Rubin
and Imbens) who read econometrics paper and say: “This is
nonsense, all they do is regression, not causation”
Bewilderment aside, we can also understand the agony of
econometrics students having textbooks which can’t decide
which side they are on, data or underlying phenomenon.
And, speaking symmetrically, we can also understand the agony of
statistics students growing up on textbooks that never even
mention the existence of a phenomenon underlying the data.
But instead of bemoaning the current state of education,
I would like to educate myself by
your remark about multilevel modeling, in which “you’re
not simply giving a distribution to data, your modeling
underlying parameters.”
Here is my question. Assume you find an economist who
writes down a bunch of structural equations, among them
Y = beta x + epsilon, and goes about his/her usual routine
of identifying and estimating beta, etc..
(Recall, by writing down Y= beta x +eps. he/she assumes a
fixed causal effect, beta, for every individual in the population).
How would you advise him/her to change his/her routine
if he/she wants to incorporate some “multilevel modeling”
techniques, without changing his/her substantive assumption
about the economy?
What would he/she do differently?
Judea: Thanks for your comments and especially this one –
“that they can solve simple problems with comfort and confidence (not like their professors)”
Pingback: Causal Analysis in Theory and Practice » Blog discussion on Causality in Econometric and Statistical education