Skip to content

Transformative treatments

Kieran Healy and Laurie Paul wrote a new article, “Transformative Treatments,” (see also here) which reminds me a bit of my article with Guido, “Why ask why? Forward causal inference and reverse causal questions.” Healy and Paul’s article begins:

Contemporary social-scientific research seeks to identify specific causal mechanisms for outcomes of theoretical interest. Experiments that randomize populations to treatment and control conditions are the “gold standard” for causal inference. We identify, describe, and analyze the problem posed by transformative treatments. Such treatments radically change treated individuals in a way that creates a mismatch in populations, but this mismatch is not empirically detectable at the level of counterfactual dependence. In such cases, the identification of causal pathways is underdetermined in a previously unrecognized way. Moreover, if the treatment is indeed transformative it breaks the inferential structure of the experimental design. . . .

I’m not sure exactly where my paper with Guido fits in here, except that the idea of the “treatment” is so central to much of causal inference, that sometimes researchers seem to act as if randomization (or, more generally, “identification”) automatically gives validity to a study, as if randomization plus statistical significance equals scientific discovery. The notion of a transformative treatment is interesting because it points to a fundamental contradiction in how we typically think of causality, in that on one hand “the treatment” is supposed to be transformative and have some clearly-defined “effect,” while on the other hand the “treatment” and “control” are typically considered symmetrically in statistical models. I pick at this a bit in this 2004 article on general models for varying treatment effects.

P.S. Hey, I just remembered—I discussed this a couple of other times on this blog:

– 2013: Yes, the decision to try (or not) to have a child can be made rationally

– 2015: Transformative experiences: a discussion with L. A. Paul and Paul Bloom


  1. Tom Passin says:

    Isn’t this the exact same thing that Mayo just blogged about?

    “Szucs & Ioannidis Revive the Limb-Sawing Fallacy” at

  2. Z says:

    I wish they would try to make their point rigorously in terms of counterfactuals. What type of estimand are they saying is non-identifiable under what assumptions about treatment?

  3. Kyle C says:

    Healy & Paul could have hat-tipped Nancy Cartwright’s 2007 book, Hunting Causes and Using Them, as Gelman & Imbens did. She’s been on the RCT beat for a while.

    • L.A. Paul says:

      Cartwright’s work is broadly relevant, you are absolutely right. We thought her ideas on RCTs were different enough from ours that we didn’t cite her, but your comment makes me think I should have been more generous.

      • Kyle C says:

        Your reply is gracious. I wouldn’t have bothered to comment except that (1) I think one should err in favor of citing women’s contributions and (2) if I understand your paper correctly her example of oral contraception could be a sort of transformational treatment wrt the risk of thrombosis in women. (It has a direct positive correlation but an indirect negative correlation by transforming the subject into someone who is unlikely to get pregnant.)

  4. L.A. Paul says:

    Can you say more here? From what I can tell, the Mayo point is about fallacies with regard to inference and hypothetical reasoning. Our argument is that the evidence underdetermines the interpretation of the causal mechanism in some interesting cases (and in a distinctive way). The deep issue concerns what the philosophers David Lewis and C.B. Martin called “finkishness”, which is a highly context-sensitive type of counterfactual dependence.

  5. jrc says:

    Page 5: “Vampires: In the 21st century, vampires begin to populate North America.” Talk about burying the lede!

    Page 8: “Economists: In the late 20th century, economists begin to populate North America.” Talk about the sequel not living up to the original!

    Page 9: “First, while vampires are imaginary monsters confined to thought experiments, economists actually exist.” I’m kinda jealous you got to publish that sentence.

    Page 12: “And if the preference has been transformed, the experimental group no longer matches the control group in the relevant sense.” I’m not sure this is right. The control group still matches ‘what the treatment group would have been like had their been no treatment’, right? Even if they are fundamentally different populations now, they are different only because the treatment group actually was treated, and the control group still gives us the relevant counterfactual to no treatment. The under-determination problem is still there (revealing old preferences or changing underlying personality/preferences), but I don’t quite see the problem for inference – the two groups could be people with the same preferences being expressed in different ways, or groups with different preferences, but the control still tells us what the treatment group would have looked like without treatment. Why isn’t that all we need? I think this is the bit that Tom relates to the Limb-Sawing Fallacy (which, by the way, I thought was funnier when I read it as “Lamb-sawing”….probably the Xmas season) – just because Ho isn’t true (just because the populations are different people) doesn’t mean we can’t calculate test statistics under the assumption Ho is true.

    …”. But if the treatment is transformative, C is rendered ineligible as a control at t2.” So no. Yeah? OK – wait, let me try…. the counterfactual thinking is NOT that `at t2, the control group resembles the treatment group in all ways except the treatment effect’. The counterfactual thinking is that `at t2, the control group looks like what the treatment group would have looked like if it hadn’t been treated’. That is, the key assumption is that at t1 the control group and the treatment group, in the absence of treatment, will look similar at t2. It is the pre-period that determines that value of the control as a counterfactual, not the post-period. I mean, people are changing over time all the time, and so the control group in t2 is not the `same people’ as it was in t1 anyway…we already have a different population in t2 in both groups – people t2-t1 periods older who have learned stuff and changed in the intervening time. The idea is that they would have evolved in the same manner absent treatment. That isn’t threatened by your thought experiment i don’t think.

    Page 13: “Possibilities multiply.” Two word sentence! (Three word sentence!)

    Page 14: “One such assumption is that relevant preferences of experimental populations are not replaced by experimental treatments.” I’m just not convinced this is a big problem, or a real assumption of the models. At the end of the experiment, I’m testing whether the two groups come from the same population. If there are transformative treatments, they don’t. I reject Ho of them being from the same population. What is the problem? I see the “immediate difficulty”, but I don’t see the “foundational difficulty”.

    Endnote 5: “We acknowledge that securing permission from the IRB is perhaps the most fantastical aspect of this thought experiment.” No worries, just run it through the World Bank or in conjunction with some despotic government that wants to breed a race of slave workers to cover the night shift.

    In conclusion: I just don’t see the foundational problem of transformative treatments invalidating the counterfactual reasoning implicit in experimental inference. Of course, I may very well be missing something. Or maybe I just got poor instruction as an undergrad philosophy student (warning: I may have taken your class, but I’m not sure…I definitely took Jenann Ismael’s Phil. of Science). But that’s where I’m at after reading this (which is fine – I enjoyed reading it, thanks!).

    For future work: I’d also enjoy reading a discussion focused on the social-economic motivations early on (race and earnings, parenthood and preferences, etc.) in the context of preference illumination v. preference change. It feels like a tease in this version… you draw the parallels, but never confront the interpretation of the motivating examples head on. That’s cool – it wasn’t the point of this paper. But I’d enjoy reading a paper where that is the point.

    • Anonymous says:

      I agree– it is well known that RCTs can provide evidence for the existence of a treatment effect but can’t elucidate mechanism! (See, for example, Pearl / Robins and Richardson on mediation and mechanism.)

  6. Great cat picture. Transformative no less.

    On p. 175 of Psychotic Reactions and Carburetor Dung, Lester Bangs refers to Lou Reed as “the cat.”

  7. Tom Passin says:

    L.A. Paul says:

    “Can you say more here? From what I can tell, the Mayo point is about fallacies with regard to inference and hypothetical reasoning. Our argument is that the evidence underdetermines the interpretation of the causal mechanism in some interesting cases (and in a distinctive way).”

    I was to some degree being cute with a similarity in the language that Andrew wrote. Yet I think there actually are some correspondences. The Healy & Paul paper discusses counterfactuals as they relate to scientific inference, and Mayo’s post is about hypothesizing on counterfactuals too.

    Mayo’s account of the statistical fallacy could be summarized by saying “If H0 [Hypothesis #0] is wrong, then how can you claim to be using comparisons between H1 and H0 to invalidate H0?”

    Healy and Paul’s similar (not exactly the same, it’s true) account could be summarized by saying “If your experiment results in changes in population P0 but not in the control P1, then how can you claim to be testing for differences the populations P1 and P0?” – since P0’s actual characteristics may have changed during the experiment, leading to P1 not being a good control group any more.

    The difference would seem to be that P0 may have actually changed during the experiment, whereas in Mayo’s post, H0 changed only in the mental status of the statistician, not in any intrinsic properties.

    I would say that Healy&Paul point out something that is actually more general, and not easy to deal with. For any outcome, whether of H0 vs H1 or P0 vs P1, if there is a difference there are many, probably an infinite number of, different hypotheses that might explain the difference. Some of them might seem implausible to the experimenter, some of them might violate known laws of physics, some of them might come into play because of things unknown to the experimenter, and so on. In the kinds of cases discussed by Healy&Paul, it seems to me that the limitation is in essence on the part of the experimenter and experiment.

    Take the fanciful first example, where the treatment group gets bitten by a vampire. The experimenter apparently wants to find out if exposure to a vampire changes a person’s attitude towards vampires. The Healy&Paul conundrum (as stated in the paper) is that the bitten people *become* vampires, and so can be expected to have different attitudes by that very fact – thus the control group suddenly isn’t a proper control group any more. But this really means that the original plan for the experiment wasn’t well thought out. The experiment didn’t accomplish what it was intended to. OTOH, if one could apply a test to determine which of the bitten really all became vampires, one could draw probably some other conclusions.

    The more general message is that it is usually hard to be sure that you are really testing what you’d like to test (or observe, in the case of observational studies), and it behooves most of us to put a lot of thought and effort into that aspect of things. And we should be a lot more humble about our conclusions. Doing statistics doesn’t substitute for the real work.

  8. Tom Passin says:

    In case I left a wrong impression, I didn’t mean I think that Healy&Paul’s viewpoint is fallacious!

  9. Anonymous says:

    Causality describes things in motion, people in reaction. The “underlying preference” the author argues actually implies a static non-reactive phenomenon, which is against the idea of causality.

  10. Tom Passin says:

    AS I thought about this some more, I came to see that the Healy&Paul phenomenon – that the experiment may actually change the subjects so that the control group is not a good control – could be quite an important factor in surveys. Here’s how this would work:

    The first question is framed so as to force the answer into one of the multiple choices. This framing is different from what the subject has used internally, and so may cause a change in his attitudes.

    Now the second question is presented. The subject is now a different person since his thinking has been reframed and possibly modified. Especially if the second question is somehow related to the first question, the subject’s answer is likely to change too.

    If there is, say, a series of ten survey questions, and each one causes a change of, say, only 2% in the category of the responses, then by the end how can you expect to results consistent with what the original (“naive” seems to be the term of art, isn’t it?) person? In fact, the Socratic method could be thought of as being a series of carefully designed survey questions administered in a particular sequence, leading to a radical change in thinking!

    This is perhaps the known order-of-questions issue, but writ larger.

    • Martha (Smith) says:

      Good point.

    • Yes, indeed. Ten-question personality tests are a case in point. Consider this one:

      One’s response to the first question (“I don’t take risks unless I’ve done some careful research or evaluation first”) could easily influence one’s response to the second (“I am a cautious decision maker”).

      Similarly, one’s response to the third question (“In large social gatherings, I often feel a need to seek out space to be by myself”) could influence one’s response to the fourth (“Too much exposure to noise or light leaves me feeling drained or spacey”).

      And so on. Before you know it, you may have agreed with all or nearly all of the statements, when on closer inspection you might have agreed with four or five (or insisted on qualifying some of them).

      • Martha (Smith) says:

        “(or insisted on qualifying some of them)” Yes, yes! Dichotomous questions miss too much of reality.

        • Martha (Smith) says:

          Well, some slight qualification: I looked at the link after writing the above, and saw that the questions do have a 5-choice Likert scale. Still, the choices offered (Very uncharacteristic or untrue, strongly disagree; Uncharacteristic; Neutral; Characteristic; Very characteristic or true, strongly agree) don’t really offer as much realistic qualification as something like “Almost always, Usually, About half the time, Occasionally, Almost never.”

          • Martha, you make an interesting distinction between degrees of typicality and degrees of frequency. That’s an important point. I agree with you, also, that a five-choice Likert scale is somewhat preferable to a yes/no format (though not a huge improvement).

            Even there, though, the responses can obscure the situation. Let’s take the first one: “I don’t take risks unless I’ve done some careful research or evaluation first.” Risks are not all alike; one might take certain risks without much research or even forethought, while avoiding other risks altogether. One would end up, perhaps, with a score of 3 out of 5–but this would conceal the strong risk-taking in one area and the lack of risk-taking in another.

            But then, whose business is it what risks “I” take, if they harm no one and violate no laws or regulations? Maybe there’s a benefit, after all, in such tests’ fallibility. The test-taker keeps a bit of privacy.

            • P.S. Speaking of risk-taking and privacy, I came upon this passage when rereading Epictetus’s Enchiridion just now:

              “In parties of conversation, avoid a frequent and excessive mention of your own actions and dangers. For, however agreeable it may be to yourself to mention the risks you have run, it is not equally agreeable to others to hear your adventures.”

Leave a Reply