Skip to content

Why ask why? Forward causal inference and reverse causal questions


Guido Imbens and I write:

The statistical and econometrics literature on causality is more focused on “effects of causes” than on “causes of effects.” That is, in the standard approach it is natural to study the effect of a treatment, but it is not in general possible to define the causes of any particular outcome. This has led some researchers to dismiss the search for causes as “cocktail party chatter” that is outside the realm of science. We argue here that the search for causes can be understood within traditional statistical frameworks as a part of model checking and hypothesis generation. We argue that it can make sense to ask questions about the causes of effects, but the answers to these questions will be in terms of effects of causes.

We also posted the paper on NBER so I’m hoping it will get some attention from economists. [Again, here’s the open link to the paper.] I think what we have here is an important idea linking statistical and econometric models of causal inference to how we think about causality more generally.


  1. Rahul says:

    To me this snippet from the article would be a strong reason to shy away from reverse causal questions: “A reverse causal question does not in general have a well-defined answer, even in a setting where all possible data are made available. “

    Science in general is partial to asking questions leading to well-defined answers, I think. It may be useful as discussion or brainstorming but not as the goal.

    • jrc says:

      I paused at that line too. Let me offer an interpretation that might be useful in thinking about why this reverse causal question might be interesting even if it doesn’t have a “well-defined answer.” I’m putting on my “try to explain this” hat, so please don’t think I’m wedded to this view, just trying to understand it.

      Gelman and Imbens give us a few examples: Why do more attractive people earn more money? Why does per capita income vary some
      much by country? [Why]* do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse?

      First, I think we want to say that these questions are, in fact, properly in the domain of potential objects of scientific inquiry – they are empirical facts about the world we would like to understand in as systematic a manner as possible. But in order to be answered “scientifically” they must be turned into fundamentally different questions. Take the first example about attractiveness and earnings. Understanding why attractive people earn higher wages is a proper research goal, but not a proper research question. To make scientific progress, the researcher would take this research goal, consider possible explanations to the general questions, and draft a set of hypotheses and tests for these hypotheses. This is what the authors’ are getting at with their notation and its usefulness – using a rigorous mathematical representation to allow all possible causal paths to reveal themselves.

      Now the researcher has a series prospective explanations: because people like being around attractive people, because they over-estimate the qualifications of attractive people, etc. They develop a test for each of these, run the tests, and publish the results. This is normal science, not even in the strictly Kuhnian way, just in the everyday way that science works. This is the result of the reverse causal reasoning – the grounding from which one devised their hypotheses and tests (the “anomaly” in the world). Without the reverse causal reasoning, there would be no reason to do the estimation and hypothesis testing – we wouldn’t have ever thought or cared to do it.

      What I think this paper is doing is providing a statistical theory (well, the sketch/outline of a statistical theory) of the first part of the problem – of taking the broad question about the world (why attractive people earn more) and turning it into practical, scientific, falsifiable, every-nice-property you want scientific questions. This is by nature an under-determined problem, in that the researcher will never be able to test every single possibility (there are infinite possible explanations). It is also not clear that there will ever be a “well-defined” answer at all, in the sense that the thing in the world is probably caused by tons and tons of different things, varying across people and places and situations.

      So I think that both 1) science has to engage with these broader questions, because they are what lead us to be interested in the smaller, direct, testable questions; and 2) just because these big questions don’t have “well-defined” answers, pretending we aren’t interested in them is just nonsense. They are what we are actually interested in, and they motivate all of our empirical work. In this sense, I think of this paper as a step toward developing a more rigorous science of drafting scientifically rigorous research programs.

      I hope that was useful and not just a long-winded re-phrasing of everything in the paper.

  2. […] 11:47 on November 11, 2013 by Mark Thoma Andrew Gelman and Guido Imbens posted this at the NBER to try to get the attention of economists: Why ask Why? Forward Causal Inference and Reverse Causal Questions, by Andrew Gelman and Guido […]

  3. Anonymous says:

    “Online access to NBER Working Papers denied, you have no subscription”

    So much for open science.

    • zbicyclist says:

      NBER is irritating, because it’s basically just private taxpaying US citizens who have to cough up $5.

      “You should expect a free download if you are a subscriber, a corporate associate of the NBER, a journalist, an employee of the U.S. federal government with a “.GOV” domain name, or a resident of nearly any developing country or transition economy.”

      It’s not Andrew’s fault, and it’s only $5, but it’s still irritating.

      • Rahul says:

        The point that annoys is a lot of academics ( not targeting Andrew) play lip service to open science. Put the words into action and boycott the closed sources. A young academic may not be able to afford it, but the senior ones at least should follow up on their rhetoric by not publishing at such places.

        If a critical mass took a stand things would be a lot better.

        • jrc says:

          NBER Working Papers aren’t really meant for general consumption. They are meant to be a means to distribute early versions of papers to researchers who can provide feedback and continue the informal pre-submission peer review process. Econ papers take a year or two to publish, partly because most papers are written, presented, and updated several times before being submitted. So the point isn’t that they are open access, it is that they are open to a select group of people who can independently judge the quality of the work.

          I think Andrew’s point is that putting it on NBER is a signal to economists. Well, a signal and a direct mailing, because we get a list of NBER papers in our email each week.

          I would be sympathetic to an argument that government-funded research should be published in open access form (at least at some point), but not working papers that have yet to be reviewed or even declared “Final” by their authors.

          Now… I’m gonna go actually read this paper, because it sounds interesting, and I didn’t get it via NBER because I’m not on the econometrics list.

          • Anonymous says:

            jrc: “So the point isn’t that they are open access, it is that they are open to a select group of people who can independently judge the quality of the work. “

            I find this a tad patronizing. Presumably there are many people out there who can also judge the quality of the work, not just a select group of economists.

            Moreover, to the extent this literature is used to inform policy (e.g. Reinhart-Rogoff manuscript, and is financed by public funds, then citizens, and other scientists ought to be given view.

            But maybe mere citizens should not be entrusted with scrutinizing the work of the economist caste.

            • jrc says:

              I get your point, and I’m sympathetic – as I said, once published I think government funded work should be publicly accessible. And certainly there are non-economists who could judge these working papers well. Similarly, I could critically read a lot of papers in Public Health or Epidemiology.

              But the point is that these are NOT published papers, not even finished papers, and authors make them accessible to a select group of peers before publication and revision so as to solicit feedback. So I think the apt comparison would be me complaining about not getting an early look at Public Health paper drafts prior to publication because I wasn’t affiliated with a Public Health dept. There might well be some social utility loss there (maybe I’d give great comments!), but I don’t think it is wrong or unfair or anything.

              But again, I just want to emphasize that we are not talking about completed, fully peer-reviewed work here. We are talking about work that is in some stage of “solicit feedback and revise.” I think that some form of this kind of review happens in all disciplines. Its just that in Economics we have a formalized system, so instead of sending drafts to a couple of colleagues, we let the field take an early look. I guess you could say that is elitist and reflects our self-assumed supremacy over every other field, but I think that’s a bit unfair.

              • zbicyclist says:

                I would hardly say that this is a select group, since it consists of the vast majority of the world population:

                “You should expect a free download if you are a subscriber, a corporate associate of the NBER, a journalist, an employee of the U.S. federal government with a “.GOV” domain name, or a resident of nearly any developing country or transition economy.”

                It’s not clear how “everybody in India” can be considered selective.

              • Anonymous says:

                I see your point. And as a private institution NBER can do as it pleases.

                But I would still argue research funded with public funds ought to be freely available if it is ready enough to be posted on NBER.

                I mean, why should Indians get it, and not most US tax payers (who presumably already paid for much of the research).

                If publicly funded research is posted in NBER, presumably it should also be posted in a public repository (e.g. Andrew has also posted a freely available version).

                Arguably, given the problem with publication bias etc.. it is most important that the public gets access to working papers, and not just to the edited body of science. And here your argument on public health is relevant. There is an all trials campaign in part to force researchers to share _all_ trials, including those not significant “working paper” ones gathering dust in the file drawer.

                If in doubt make it public. Always.

              • Rahul says:


                One key difference is that I cannot read Public Health paper drafts by paying $5.

                So, it’s not as if NBER has professional reasons for the exclusiveness. General consumption is perfectly OK so long as we can make some money.

                This is no professional matter, purely a monetary one.

  4. Brian says:

    Nice paper. Reminds me a little of the way Freedman and Cox talk about regression and causal inference.

    I have a question about your discussion of “two alternative models” to explain (spurious) correlations on p5.

    You use different notation and language to describe them, but they both seem like the same class of problems: unobserved confounders/omitted variables. Or am I missing something? The only difference I can see is that in the first model, the unobserved confounder is potentially manipulable (carcinogenic exposure), whereas in the second model it is not (genetic background).

    I know that some people (e.g. Cox) distinguish between these classes of confounders (causes vs. attributes), so I suppose that makes sense. But why the different notation?

    Ps – Pearl is in the bibliography, but you didn’t actually cite him in the paper.

  5. K? O'Rourke says:

    Not sure if this will be helpful, but it is what came to mind when I read your paper.

    E.g. “Causal conjecture is the formulation of such descriptions on the basis of limited evidence, and causal inference is the
    confirmation and testing of causal conjecture.”

  6. […] – Tagged: Possibly Not-So-Useless Data View on […]

  7. […] Possibly Not-So-Useless Data What can statistics tell us about causality? – Andrew Gelman […]

  8. Ashok Rao says:

    Very interesting paper. Would like to see economists think about this in the context of the financial crisis.

  9. Bill Harris says:

    I think focusing on either direction misses the point in many cases, for it assumes that A causes B or B causes A. What if causality ran in loops? That is, what if A caused B /and/ B caused A? For example, I get an infection, and my body temperature rises (A causes B). My increased body temperature kills off some of the bacteria, eventually stopping the infection (B causes not A).

    Play around with real or simulated feedback systems and see how the work. For a classic introduction in a largely linear domain, read up on feedback control theory. For much more interesting problems, read up on strongly nonlinear feedback in the system dynamics literature (John Sterman’s /Business Dynamics/ is a current canonical text). For ways to play around with simulated feedback, see MCSim (for a Bayesian tool approach), Stan (when the ODE capability is released), commercial tools such as Vensim or iThink / STELLA, or even hand-crafted custom simulations (simple ODE solvers using Euler or Runge-Kutta integration are pretty easy to code; doing what you want with the data can be harder).

  10. Econ says:

    I think the basic idea is very intuitive and clear to anyone that has worked on real world questions (in other words, someone not doing research in and for academia and academics). But what is the practical import of this ? You mention model checking and selection. This could be implemented using different methods of fit, in/out-of-sample prediction, coefficient sign/magnitude etc. This is being done already. Anything else ? This might validate what a lot of people already do, but I’m struggling to find a non-obvious insight.

    • dab says:

      Personally, I’m very appreciative when someone takes the time to articulate something that is “intuitive and clear to anyone who has worked on real world questions.” Such articulations seem to me to be crucial for the progress of science….

  11. Jeff says:

    At least in epidemiology, reverse causality has a different meaning. It refers to when the exposure-outcome relationship goes in the opposite direction of what one thinks.

    “The negative association between breastfeeding and linear growth reflected reverse causality. Increased breastfeeding did not lead to poor growth; children’s poor growth and health led to increased breastfeeding”
    from here

  12. jonathan says:

    My first reaction is that people get this wrong so often I don’t have much hope. As in, unemployment insurance is why we have high unemployment. (That guy is tenured at Chicago.)

    My second reaction is people genuinely aren’t capable of understanding that systems express as results. They think, for example, that this car accident occurred because this person fell asleep or this part broke without recognizing that as long as people drive vehicles there will be accidents and thus something will happen to cause this or that individual accident. You try to point out you can perhaps squeeze the system – as is being done with basic precautions like checklists in surgeries – but people are, it seems, highly incapable of realizing that this accident and that one and that each had individual reasons but all those reasons express what’s going on in the larger aggregate of the system in which all these accidents occur.

    I don’t think it’s a particularly subtle point but it took many years after the germ theory was recognized for doctors to wash their hands BEFORE surgery and then many decades before doctors and nurses and administration realized that every item in a hospital is handled so the guy delivering food is a vector and the nurse going from patient to patient is a vector, etc. They still don’t do well at that basic systemic improvement. (And of course now we get evolution of surviving pathogens within hospitals … or my recent favorite, the discovery of “contaminants” in clean rooms that can survive on the tiny bits of moisture available.)

    In a non-scientific application, I read sources from the Arab world on a regular basis and have noted the fairly common occurrence of taking the evidence and blaming someone else for the troubles that infect the region. To reverse the famous line, the fault lies not in ourselves but in the stars – or in these cases, on the US and “zionists”. I mention this kind of idiocy as an example we often see: rather than identify the system as it expresses all these terrible results, take the available data and use that to exclude the obvious in favor of the unlikely and impossible. If you read history, this is the usual case: as in, there’s an issue with the crops, the witch did it. We humans are not well conceived to think about the systems that generate the results which occur. We are bad at seeing the true nature of results and we imagine causal chains and then hold to these imaginings as “belief” – which can become unshakeable no matter the weight of evidence. Some are “conspiracy theories” and others are religions.

    One hobby is to take very old sources to apply reverse causal ideas. You are, after all, talking about a version of philology. A favorite, being Jewish, is the story of Abraham and Isaac. I have been “taught” over and over that Abraham was a righteous man who, being righteous, listened to the instruction to sacrifice Isaac – and for that was “rewarded” by being stopped with a ram substituted. Really? If listening to the words in your head telling you to do things is the test for righteous, then the 9/11 bombers were righteous and every freaking zealot who does horrible things is righteous. The only way this version of “learning” is correct is if we ignore all that. But going in reverse, we know that child sacrifice was real. We have lots of archeological evidence and the Torah itself refers to it – notably in the section where it prohibits 3 Canaanite rituals, those being ritually killing your child, ritual sex with animals and ritual homosexual sex (usually misunderstood and distorted into a prohibition of all homosexual sex). There’s more but you don’t need to have Abraham be “righteous” in listening to instructions to kill his son, no need to have that image implanted that somehow zealotry is God’s will. The simpler, more systemic view – the reverse generated causality – would be that Abraham heard this idea – something that seems to have occurred a lot in the ancient world (and which occurs today in the form of honor killing) – and the lesson is rather starkly: don’t kill your children. That is, rather than listen to what the “community” judges you must do either as ritual or to save face, protect your own heritage, your own bloodline, your own genetics and pass that down instead of killing your own future. So the actual meaning derived in reverse is essentially the opposite of the way we are taught: it is not that Abraham listened and was rewarded but that he was prevented from killing Isaac and was taught the lesson “don’t kill your child and thus the future of your line”. That in turn would mean all those zealots who do horrible things are in fact wrong, that they listen to the voices in their heads when they should not be hurting other people. I went through that relatively long bit as an example of how deeply rooted the issue is in our brains.

    I appreciate any attempts to formalize this area. I imagine you would be burnt as a witch.

  13. Gaius Gracchus says:

    Any researcher who dismisses the search for causes as “cocktail party chatter”, outside the realm of science, does not realize that this search is a fundamental part of a very important field of science, or, at the very least, engineering: statistical and engineering process control (SPC and EPC). After all, the ideal operation of a control chart in SPC/EPC is to move a previously unknown source of “trouble” in the process from unknown (special cause) to known (assignable cause) in order to better control the process. This fact directly corresponds to Gelman and Imbens’ argument in the paper. This logic is clearly described in Box, Luceno, and Paniagua-Quinones book “Statistical Control by Monitoring and Adjustment”, and is beautifully illustrated in George Box’s sawtooth diagram (which, according to Steve Stigler in the books “Box on Quality and Discovery” and “Improving Almost Anything”, has been recently discovered, carved on a cave wall and dated to about 3000 B.C.).

    Accordingly, statistical/engineering process control constitute one example that is very much relevant to the discussion on forward and reverse causal inference. Unfortunately, this strong example is not present in Gelman and Imbens’ paper.

    Further articles by eminent statisticians such as Box and Tukey (e.g., listed below) beautifully articulate the need for forward and reverse causal inference for improving one’s understanding of science, and should be read and appreciated by anyone who considers him/herself a serious statistician.

    • Fernando says:

      Some social scientists advocate and use Directed Acyclic Graphs to represent causal knowledge, including known, known but unmeasured, and unknown unknown sources of variation.

      These graphs have a lot in common with Ishikawa Diagrams, as used in manufacturing quality control.

      Indeed, I for one see no difference between the objectives of SPC and social science. Presumably social scientists strive to understand their environment, control it, and improve their outcomes.

      However, I am told such lofty objectives are only to be discussed at cocktail parties, and in joking terms. Go figure.

    • Andrew says:


      Thanks for the references. I agree that the Box paper is relevant and we can cite it. I don’t see the Tukey paper as being so relevant, but I have cited Tukey in other contexts regarding the idea that exploratory data analysis and confirmatory data analysis (in Tukey’s sense) can both be viewed as forms of model checking. See, for example, my 2003 paper, A Bayesian formulation of exploratory data analysis and goodness-of-fit testing.

      I do think the new paper with Imbens adds something in that we are applying these ideas to the particular issue of how to think about reverse causal questions. But I think you are right that our paper will be stronger with the Box reference. You have to remember that we are addressing a particular concern within the fields of econometrics and statistics.

    • K? O'Rourke says:


      Do you know if Stigler wrote anything up on induction – deduction being carved on a cave wall and dated to about 3000 B.C. – or just said it to Box with a grin on his face?

      p.s. though here I believe most of the dicussion is much more relevant to abduction

  14. Bill Harris says:

    Jonathan: does system dynamics approach formalizing this process for you? describes it briefly with references. gives some examples, and gives a large number of models coded for Vensim. Vensim has a free model reader at

  15. […] this post by Andrew Gelman on “reverse causal questions” helps to sharpen some of my comments here about […]

  16. David P says:

    Is the terminology (“effects of causes,” “causes of effects”) standard? I first saw it in some of Dawid’s articles, e.g., . Was he the conduit or is there some older source? [I see after wrting the last sentence that Gelman (2011) refers to Mill.]

  17. […] open-minded, thought-provoking and non-dogmatic statistical thinking highly recommendable. The plaidoyer infra for  ”reverse causal questioning” is typical […]

  18. Constructive criticism says:

    constructive criticism:

    The business example quoted in your paper is not a good example of a reverse causality question. It is an example of a question about missing data. P&G aren’t confused about their model of he world regarding account sales. They know that if coupons aren’t distributed on time that it will hurt sales. They don’t know if the coupons were distributed on time. This is about missing data, not about models that don’t align with observations even when all the data are available.

    If my car alarm is going off and I ask “why?” it’s not because I don’t have a good model of my car alarm. It’s because I don’t know whether someone is jumping on my car. This is also not a reverse causal question.

    • Andrew says:


      Go take this one up with Kaiser Fung. He’s the one who works in the business world, and he’s the one who told that particular story. I believe that in his example it was not as simple as not knowing if certain coupons were distributed on time. As he wrote, “lots of possible hypotheses could be generated… TV ad was not liked, coupons weren’t distributed on time, emails suffered a deliverability problem, etc. By a process of elimination, one can drill down to a small set of plausible causes. This is all complex work that gives approximate answers.”

  19. […] open-minded, thought-provoking and non-dogmatic statistical thinking highly recommendable. The plaidoyer infra for  “reverse causal questioning” is typical […]

  20. Susan says:

    Hi Dr. Gelman,

    I am a student studying educational measurement and enjoy following your work. In particular, this paper has really re-framed the way I am thinking about my own dissertation research. In my research I am examining the correlation between aggregate measures of student growth for teacher evaluation (SGPs) and the characteristics of students in their classrooms like poverty (FRL). My hypothesis is that differential learning patterns over the summer months (which have been shown to be economically moderated) are, in part, driving the observed correlation between SGPs and FRL. In this case, when controlling for summer loss/growth, I do not expect the correlation between SGPs and FRL to be zero, but a rather reduced. I am interested in studying the significance of the reduction in shared variance when moving from the bivariate to the partial correlation model. Would you agree that it would make sense to test this difference with an F-test? My hunch is that the formula would look something like this (where r2 is r-squared):

    Fobs = [(r2(Y,Z) – r2(Y,Z.V))/1] / [(1-r2(Y,Z))/(n -2)]

    Have you seen this type of test done before? I haven’t come across anything exactly like this in my reading. Thank you!

Leave a Reply