New prize on causality in statstistics education

Judea Pearl writes:

Can you post the announcement below on your blog? And, by all means, if you find heresy in my interview with Ron Wasserstein, feel free to criticize it with your readers.

I responded that I’m not religious, so he’ll have to look for someone else if he’s looking for findings of heresy. I did, however, want to share his announcement:

The American Statistical Association has announced a new Prize, “Causality in Statistics Education”, aimed to encourage the teaching of basic causal inference in introductory statistics courses.

The motivations for the prize are discussed in an interview I [Pearl] gave to Ron Wasserstein. I hope readers of this list will participate, either by innovating new tools for teaching causation or by nominating candidates who deserve the prize.

And speaking about education, Bryant and I [Pearl] have revised our survey of econometrics textbooks, and would love to hear your suggestions on how to restore causal inference to econometrics education. [I’m confused on that last point; I thought that causality was central to econometrics; see, for example, Angrist and Pischke’s book. — AG]

27 thoughts on “New prize on causality in statstistics education

  1. Is there any evidence that scientists who specifically study “causality” are better scientists than those without such formal education?

    You can ask the same question about “Philosophy of Science”. Is there any evidence that scientists who specifically study the Philosophy of Science are better scientists than those without such formal education? Answer: NO.

      • Actually, I’d take even circumstantial evidence. For example, Laplace was doing Bayesian statistics to determine which small measured aberrations in astronomy couldn’t reasonable be explained by measurement errors, and then he applied Classical Mechanics to investigate any “significant” anomalies. How exactly would Laplace have benefited from any formal training in casual analysis? It seems like his “casual inference” was pretty much perfect without any training at all, which is exactly the same impression I get from every Physicists I’ve ever met that wasn’t exposed to Frequentist Statistics.

        People who have been exposed to Frequentist ideas on the other hand are full of the following intuitions:

        -Instead of thinking of the data as real and a probability distribution as a made up construct, they think of the probability distribution as real and the data as some kind of phantom outcome out of an amorphous universe of possible outcomes.
        -They imagine an interval estimate for a parameter is something that is going to contain the right answer a fixed percentage of time in experiments that will never be performed.
        -They imagine infinite prepositions of experiments that couldn’t possible be repeated even in principle,
        -They imagine multiple repetitions of our universe in order to be able to think about certain probability distributions.
        -They imagine they’re examining “data generation mechanisms” even though there is no there is no Frequentist analysis imaginable which would have lead them Euler’s equations of rigid body motion by conducting statistical analysis of coin flips.

        All this focus on irrelevancies seems to be pretty much destroy everyone’s intuition about real physical systems. Of course, some statisticians are so brilliant they can overcome these shortcomings and still do real work. For everyone else though, their intuition seems to be permanently damage by this nonsense. The need for “casual inference” seems to be a solution to an artificial problem great by Frequentist statistics, which to this day is the first look that almost every student sees of statistics.

        • Entsophy, this is a better list of reasons to be uncomfortable with the frequentist paradigm for statistics than what I have managed to come up with. Thank you.

          And ignore the troll (below), As a 2+ year reader of this blog, let me opine that comments from folks such as your good self, K? O’Rourke and Bill Jeffreys (not an exhaustive list) add value to this already excellent blog.

  2. I like Pearl and he is right to push for change.

    Change is generational so I see the focus on education.

    The current establishment is never going to change.

    PS Angrist and Pischkes book is not essentially about causality. You don’t need probability or regression or counterfactuals to teach causality: Mill’s methods and DAGs will do for identification and estimation. Probability only comes in to summarize uncertainty.

  3. I’ve managed a significant number high speed/low drag quantitative efforts. The typical analyst involved had either a good BS degree or a week advanced degree in a quantitative field. Most of this crowd had above average, but below genius, intelligence. Academically, I’d say they’re about equivalent to an average Ph.D. in Sociology or Psychology.

    Within that crowd, there were two groups: those that had some statistical straining and those that didn’t. What I noticed is that those who had no statistical training never had a problem figuring out causal relationships. Nor was their lack of exposure to basic statistical methods, like hypothesis testing, p-values, or linear regression, ever a problem. They always seemed to find some clever way of looking at the data without statistics that brought out the essential evidence and which was almost always far more convincing.

    The only problems I ever had was with those who had a basic statistics education. They constantly made unwarranted casual leaps in their analysis. So my recommendation for improving casual inference is to stop teaching the introductory hypothesis test/p-value blah blah blah. There are after all alternatives. If those alternatives are too difficult to teach in a cookie cutter fashion, thereby leaving students with either no statistical training, or very good statistical training, then so much the better.

    • You sound like a crank with an ax to grind.

      Please find another blog, or your own blog, to voice your anecdotes and tribal posturing. You are increasing the noise within the comments section.

      • OMG, you’re right they are anecdotes; I didn’t even calculate a p-value, directed acyclic graph or anything. I repent and apologize to the other tribes. I can see now that I’ve been thinking about causality all wrong, just like those other ignorant rubes and cranks (Galileo, Newton, Euler, Gauss, Cauchy, Gibbs, Maxwell, Einstein, Schrodinger).

        One day, “casual inference” won’t just be the plaything of a select few Super Scientists (disciples of Pearl or Rubin), who alone have to make all the breakthroughs, but will be part of everyone’s education. When that happens we’ll see an explosion of scientific understanding that will make the Enlightenment look like finger painting.

    • You make perfect sense to me.

      The classical introductory statistics curricula is fundamentally demented. It’s taught as a series of cookbook algorithms that one applies, who knows why. There is little discussion of any justification for the techniques as properly performing inference based on data, mainly because there is little justification. The techniques just don’t hold water at their foundations.

      Back in grad school (EE, machine learning), I took the introductory graduates statistics sequence. Didn’t make a lick of sense, and the techniques tended to obscure straightforward solutions. I found Pearl and Jaynes, everything instantly made sense, and the approach would instantly clarify otherwise complex problems.

  4. An economist (econometrician) friend of mine often corresponds with Prof. Pearl, and what I understand is that Pearl believes the econometrics approach to causality is deeply, fundamentally wrong. (And econometricians tend to think Pearl’s approach is fundamentally wrong.)

    It sounds to me like Pearl was being purposefully snarky.

    • Yes, the problem with the econometrics approach is that it lumps together identification, estimation, and probability, so papers look like a Xmas tree.

      It all starts with chapter 1 in econometrics textbooks and all those assumptions about the disturbance, linearity, etc…

      Yet most discussions in causality oriented papers revolve around identification and for that you can mostly leave out functional forms, estimation, and probability.

      Why carry around reams of parametric notation when it ain’t needed? One wonders how Galieo, Newton, or Franklin ever discovered anything without X’X^(-1)X’Y?

    • Jack, I think you misunderstood what you friend told you.
      If you read any of my papers or books you will come to realize immediately
      that I believe the econometrics approach to causality is deeply an fundamentally
      right (I repeat: RIGHT, not WRONG). Although there have been two attempts to
      distort this approach by influx of researchers from adjacent fields — see my
      reply to Andrew on this page, or read http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf

      Next, I think you are wrong in stating that “econometricians tend to think Pearl’s approach is fundamentally wrong”. First,I do not offer anyone “an approach”, I offer mathematical tools to do
      what researchers claim they want to do, only with less effort and greater clarity, which researchers may
      choose to use or ignore. The invention of the microscope was not a “new approach” but a new tool.
      Second, I do not know a single econometrician who tried my microscope and thought it is “fundamentally
      wrong”; the dismissals I hear come invariably from those who refuse to look at the microscope for religious reasons.

      Finally, since you went through the trouble of interpreting hearsay and labeling me “purposefully snarky”, I think you owe readers of this blog ONE concrete example where I criticize an economist for reasons that you judge to be unjustified. You be the judge.

  5. Reply to Andrew:
    Causality is indeed central to econometrics.
    Our survey of econometric textbooks
    http://ftp.cs.ucla.edu/pub/stat_ser/r395.pdf
    is critical of econometric education today, not of econometric
    methodology proper.
    Econometric models, from the time of Haavelmo (1943), have
    been and remained causal
    (see http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf)
    despite two attempted hijacking, first by
    regressionists, and second by “quasi-experimentalists,”
    like Angrist and Paschke (AP). The six textbooks we reviewed
    reflect a painful recovery from the regressionist assault which more
    or less disappeared from serious econometric research, but is
    still obfuscating authors of econometric textbooks.

    As to the debate between
    the “structuralists” and “experimentalists,” I address it
    in Section 4 of this article:
    (see http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf)

    Your review of Angrist and Paschke book “Mostly Harmless
    Economometrics” leaves out what in my opinion is the major drawback
    of their methodology: sole reliance of instrumental variables
    and failure to express and justify
    the assumptions that underlie the choice of instruments.
    Since the choice of instruments rests on the same type of
    assumptions (ie.,exclusion and exogeneity) that Angrist and
    Paschke are determined to avoid (for being “unreliable,” ) readers
    are left with no discussion of what assumptions do go into
    the choice of instruments, how they are encoded in a
    model, what scientific knowledge can be used to defend
    them, and whether the assumptions have any testable
    implications.

    You point out that Angrist and Pischke completely avoid the task of
    model-building; I agree. But I attribute this avoidance,
    not to lack of good intentions but to lacking mathematical
    tools necessary for model-building. Angrist and Pischke
    have deprived themselves of using such tools by making an
    exclusive commitment to the potential outcome language,
    while shunning the language of nonparametric structural models.
    This is something only he/she can appreciate who attempted
    to solve a problem, from start to end, in both languages,
    side by side. No philosophy, ideology, or hours
    of blog discussion can replace the insight one can gain by such an exercise.

    • This is a horribly incomplete characterization of Angrist & Pischke’s textbook. The discussion of instrumental variables is quite nuanced and represents but one topic in a much broader discussion of identifying and estimating causal effects. Sure there are gaps and some material is already outmoded, but it provides an outstanding foundation in my opinion. In their identification results, I can’t imagine there could be contradictions with what would obtain using your NPSEM approach—in fact if you look at their characterization of dose response functions I am inclined to say they have already subsumed most of what your text provides and done one better by marrying it with workable and robust approach to estimation.

      • Cyrus,
        The purpose of my post was not to provide a complete
        “characterization of Angrist and Pischke’s textbook.”
        Its stated purpose was to point out “what in my opinion is the major
        drawback of their methodology.” Among other drawbacks, I
        listed: (1) failure to encode the IV assumptions in the model
        (2) failure to reason about them,
        and (3) failure to discuss whether these assumptions have
        testable implications.

        Of course there can be no contradiction between
        the method of Angrist and Pischke and the one
        based on nonparametric structural equations (NPSEM);
        the former is what remains from the latter after
        a few mathematical tools are forbidden. By analogy, arithmetics
        that forbids multiplication would never contradict
        ordinary arithmetics that embraces both multiplication
        and addition.

        If you think that Angrist and Pischke’s book provides an
        outstanding foundation for identification I would
        challenge you to assess how many of their
        students can solve the toy problems presented
        in Section 3.2 of this article:
        http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf
        especially those pertaining to instrumental variables
        (section 3.2.4). Note that these problems are not
        contrived to prove my point; these are the most elementary and recurring
        problems in the analysis of IV’s, e.g., Is there an instrumental
        variable in our model? What would the IV estimand be?
        You cannot get more elementary than that.

        I would be curious to know your assessment.

        • I feel pretty secure in assuming that an AP student would apply the tools of conditional probability and counterfactual reasoning as needed and required to answer those questions. There’s nothing exotic about what one learns from AP that would prevent one from doing so (and there is nothing that restricts relative to NPSEM in a manner that resembles the silly reference of an “arithmetics that forbids multiplication”). Nonetheless I can contribute to taking up your challenge by assigning the question to an actual class of mine (who are trained using AP) if you accept to find a way to assign to a comparable class the same plus something along the lines of the LATE result, say, with premises articulated in potential outcomes (the latter are already assigned to mine). Heck, there’s no reason for us to accept this single idiosyncratic test: we could do this on a larger scale with reasonable rigor were there buy-in by relevant faculty. All that is needed then is an agreed upon set of canonical causal problems.

        • Cyrus,
          We have a deal!
          I like your proposal to create a large scale database of
          canonical causal problems that the causal inference community
          agrees represents what students need to know in this area.
          (BTW, have a look at the criteria for submitting nominations
          for the causality education prize, and check if it does not
          meet your expectations)

          I am glad you are already assigning my toy problems to your
          class, and I accept your condition in the bargain.
          (“to assign to a comparable class the same plus something along
          the lines of the LATE result, say, with premises articulated
          in potential outcomes”). This would probably be easier for me,
          because my students are equally conversant in both languages
          and, as a matter of fact, the LATE theorem has
          been assigned as a homework in my causal inference class
          in the past 15 years. (See
          http://bayes.cs.ucla.edu/BOOK-2K/viewgraphs.html
          Week 7, Homework 3).

          Two remarks before we embark on this exciting experiment.
          You say: “There is nothing exotic about what one learns
          from AP that would prevent one from doing so [ie, apply
          probability and counterfactuals to solve the problems]”.
          I agree; the obstacles surface not in what AP teach but
          in what they do not teach, namely, two indispensible tools
          of causal inference: 1. How to read counterfactuals and
          ignorability conditions in a given NPSEM model and, (2) how
          identify the testable implications of a given NPSEM.
          And, as I wrote recently, the neglect is not accidental
          but cultural.
          “.. the PO framework has also spawned an ideological
          movement that resists this symbiosis and discourages its
          faithfuls from using SCM or its graphical representation.

          This ideological movement (which I call
          “arrow-phobic”) can be recognized by a total avoidance
          of causal diagrams or structural equations in research
          papers, and an exclusive use of “ignorability” type
          notation for expressing the assumptions that (must) underlie
          causal inference studies. For example, causal diagrams are
          meticulously excluded from the writings of Rubin, Holland,
          Rosenbaum, Angrist, Imbens, and their students who, by and
          large, are totally unaware of the inferential and
          representational powers of diagrams.
          (See http://www.mii.ucla.edu/causality/?p=554 for full text
          of my position on the PO and SCM frameworks)

          Lastly, if we are going to collaborate,
          I must ask you to refrain from using disrespectful
          adjectives such as “silly” (as in your “.. in a manner that
          resembles the silly reference of an arithmetics that forbids
          multiplication”). I do not use analogies
          lightly. And the analogy to arithmetics was
          chosen carefully, to represent the cultural prohibition
          that the PO camp imposes on its faithfuls. Quoting
          again from my blog piece, I wrote:
          ———————–
          “The arrow-phobic exclusion can be compared to a prohibition
          against the use of “multiplication” in arithmetics.
          Formally, it is harmless, because one can always replace
          multiplication with addition (e.g., adding a number to
          itself n times). Yet practically, those who shun
          multiplication will not get very far in science.

          The rejection of graphs and structural models leaves
          investigators with no process-model guidance and, not
          surprisingly, it has resulted in a number of blunders which
          the PO community is not very proud of.
          ————————
          Do we have a deal?
          Judea

        • Judea (if I may),
          I am replying above you as we seem to have exhausted the nested “reply-to” levels available.

          Here’s how I am coming to see the experiment : establish a set of canonical causal problems, and let students’ attempts to solve them shed light on the relative merits of potential outcomes vs graphical or NPSEM analytical tools for different types of problems. It will be good to have this set of problems for pedagogical purposes. Others can benefit from it too.

          I expect we will find that there are comparative advantages and disadvantages in each. Whether one can fully integrate the other is a question though.

          In my own work, I switch freely between analytical approaches, appreciating the comparative advantages.

          It seems you do too: I note for example that your assignment related to LATE (to which you link below) has students first recast the IV problem in terms of potential outcomes and then discover the LATE result. This is about as clear a case as one might hope of a shift of analytical frameworks allowing one to uncover new and profound insights previously hidden from view. I hope you acknowledge this, and the broader class of principle stratum results, as being a major accomplishment for those working with the potential outcomes analytical framework. And this is an even less fundamental accomplishment than what those working with potential outcomes have done to provide a coherent foundation for robust estimation and inference (after all identification is just the very start of the process).

          Having had the chance to have this more elaborate exchange (and I am grateful for your participation and humor, even despite using phrases like “silly”!), my more refined take on your critique of AP is that they do too little to help students understand from where identification might come beyond randomized experiments or striking natural experiments. I am not sure this is disservice or oversight, but quite possibly a very mindful neglect.

        • Cyrus,

          I am glad you propose to start with a list of canonical problems, and let students

          choose whatever combination of techniques they deem useful to get them solved.

          I will let you take the first shot, because my definition of a “problem” may not

          be the same as yours — for me, a problem must start with a story that everyone understands.

          My book is full with those, but I know that “stories”, in some very respectable circles,

          are mocked as “toy-like” and are immediately replaced with numerical tables of statistical data.

          So, I am anxious to see an example of a “problem definition”.

          As to your comments on the drawbacks and achievements of the PO framework,

          I suspect you did not read the end of my blog post, where I mention three

          embarrassing blunders that PO researchers fell into, having to operate in the

          darkness of the “missing data” black box. I will copy that portion below. Note

          that I count the “principal strata framework” (not the concept) as one of those blunders,

          and I explain why. Here it is:

          —————————start of quote ————

          The rejection of graphs and structural models leaves investigators with no process-model guidance and, not surprisingly, it has resulted in a number of blunders which the PO community is not very proud of.

          One such blunder is Rosenbaum (2002) and Rubin’s (2007) declaration that “there is no reason to avoid adjustment for a variable describing subjects before treatment”
          http://www.cs.ucla.edu/~kaoru/r348.pdf

          Another is Hirano and Imbens’ (2001) method of covariate selection, which prefers bias-amplifying variables in the propensity score.
          http://ftp.cs.ucla.edu/pub/stat_ser/r356.pdf

          The third is the use of ‘principal stratification’ to assess direct and indirect effects in mediation problems. which leads to paradoxical and unintended results.
          http://ftp.cs.ucla.edu/pub/stat_ser/r382.pdf

          In summary, the PO framework offers a useful analytical tool (i.e.. an algebra of counterfactuals) when used in the context of a symbiotic SCM analysis. It may be harmful however when used as an exclusive and restrictive subculture that discourages the use of process-based tools and insights.

          Additional background and technical details on the PO vs. SCM tradeoffs can be found in Section 4 of a tutorial paper (Statistics Surveys)
          http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf
          and in a book chapter on the Eight Myths of SEM:
          http://ftp.cs.ucla.edu/pub/stat_ser/r393.pdf

          Readers might also find it instructive to compare how the two paradigms frame and solve a specific problem from start to end. This comparison is given in Causality (Pearl 2009) pages 81-88, 232-234.
          ————————-end of quote ——————-

          Please note the last remark, which leads you to an example of a “causal problem” solved
          in the two frameworks, starting with a “story” and ending with an estimate.
          I think it is the only such example in the literature, but you may surprise me.

          I like your “mindful neglect” excuse for PO’s blunders.
          I would not be so forgiving. My 20 years experience with many
          of its researchers leads me to a different characterization: “mindful resistance.”
          by which I mean mindful resistance to invest the 4 minutes it takes to learn
          the multiplication table. (And I choose my analogies carefully).

          Looking forward to your first causal example.

  6. Reply to all discussants,
    I hear many voices who agree that statistics education needs a shot
    of relevancy, and that causality is one area where statistics education has
    stifled intuition and creativity.
    I therefore encourage you to submit nominations for the causality in statistics
    prize, as described in http://www.amstat.org/education/causalityprize/
    and http://magazine.amstat.org/blog/2012/11/01/pearl/

    Please note that the criteria for the prize do not require fancy formal methods;
    they are problem-solving oriented. The aim is to build on the natural intuition that students
    bring with them, and leverage it with elementary mathematical tools so that they
    can solve simple problems with comfort and confidence (not like their professors).
    The only skills they need to acquire is: (1) articulate the question, (2) specify
    the assumptions needed to answer it (3) determine if the assumptions have testable
    implications.
    The reasons we cannot totally dispose of mathematical tools are: (1) scientists have local
    intuitions about different parts of a problem and only mathematics can put them all together
    coherently, (2) eventually, these intuitions will need to be combined with data to come up
    with assessments of strengths and magnitutes (e.g., of effects). We do not know how to
    combine data with intuition in any other way, except through mathematics.

    Recall, Pythagoras theorem served to amplify, not stifle intuitions of ancient geometers.

    • Chrisare,
      Thanks for bringing this post to my attention. No,
      the post is not just making fuss about nothing; it reflects
      the prevailing thinking among many mainstream analysts,
      (perhaps not represented on this blog).
      William Briggs, the blog master, says that
      “The equation Y = beta x + epsilon is WRONG,”
      “and in a sad way, too.”

      Whereas Paul Holland wrote in 1995:
      “The only meaning I have ever determined for such an
      equation is that it is a shorthand way of describing the
      conditional distribution of Y given X.”
      Briggs goes further and states that the equation is plainly
      WRONG, and that the only correct way of writing what the equation
      means is to specify the full-blown bi-variate distribution
      of X and Y.

      It would probably come as a shock to Briggs, Holland and
      other analysts to know that, since Haavelmo (1943), economists
      have taken the structural equation Y = beta x + epsilon
      to mean something totally different, and
      that it has nothing to do with the distribution of X and Y.
      And I literally mean NOTHING; structural equations are distinct
      mathematical objects that convey totally different information
      about the population and, in general, they do not even constrain
      the regression equation describing the same population.

      Well, you said you would be interested to hear Andrew and
      others respond — I join you in interest.
      Andrew (and others), can you contribute a thought or two?
      I am curious to know if Haavelmo’s distinction
      is common knowledge, or comes as a surprise to readers
      of this blog.

      • Judea:

        I don’t usually get much out of those old-style theoretical papers but I know that some people (including you and Rubin, each in your own way) do, and I respect the search for intellectual antecedents to current work. As I recall, a key difference between the regression notation used in statistics and econometrics is that statisticians tend to model the data while econometricians model the underlying phenomenon. Thus, for example, in a simple regression model the economist will talk about the assumption that the error is independent of the predictors, whereas statisticians think of that as part of the model specification and not a substantive testable assumption. In my opinion, many of these notational tangles become more understandable with multilevel models, because with multilevel modeling you’re not simply giving a distribution to data, you’re modeling underlying parameters. This brings the statistical approach closer to the economics approach in which latent variables are often in mind.

        P.S. As a statistical educator, I appreciate your generosity in endowing this prize.

        • (Trying to reply, but the system says: duplicate)

          Andrew,
          You hit it right on the nail: “statisticians tend to model the data
          while econometricians model the underlying phenomenon.”
          But this cleavage is far from being a topic of “old-style
          theoretical papers” or “intellectual antecedents to current
          work”; it is a major impediment to current work.

          Given this cleavage, we can understand the bewilderment
          of economists (like Heckman and Leamer) who read statistical
          papers and say: This is nonsense, all they do is modeling
          the data”. It is also easy to understand the bewilderment
          of statistics-trained analysts (like Holland and Rubin
          and Imbens) who read econometrics paper and say: “This is
          nonsense, all they do is regression, not causation”
          Bewilderment aside, we can also understand the agony of
          econometrics students having textbooks which can’t decide
          which side they are on, data or underlying phenomenon.
          And, speaking symmetrically, we can also understand the agony of
          statistics students growing up on textbooks that never even
          mention the existence of a phenomenon underlying the data.

          But instead of bemoaning the current state of education,
          I would like to educate myself by
          your remark about multilevel modeling, in which “you’re
          not simply giving a distribution to data, your modeling
          underlying parameters.”

          Here is my question. Assume you find an economist who
          writes down a bunch of structural equations, among them
          Y = beta x + epsilon, and goes about his/her usual routine
          of identifying and estimating beta, etc..
          (Recall, by writing down Y= beta x +eps. he/she assumes a
          fixed causal effect, beta, for every individual in the population).
          How would you advise him/her to change his/her routine
          if he/she wants to incorporate some “multilevel modeling”
          techniques, without changing his/her substantive assumption
          about the economy?
          What would he/she do differently?

  7. Judea: Thanks for your comments and especially this one –
    “that they can solve simple problems with comfort and confidence (not like their professors)”

  8. Pingback: Causal Analysis in Theory and Practice » Blog discussion on Causality in Econometric and Statistical education

Comments are closed.