Skip to content

Macartan Humphreys on the Worm Wars


My Columbia political science colleague shares “What Has Been Learned from the Deworming Replications: A Nonpartisan View”:

Last month there was another battle in a dispute between economists and epidemiologists over the merits of mass deworming.1 In brief, economists claim there is clear evidence that cheap deworming interventions have large effects on welfare via increased education and ultimately job opportunities. It’s a best buy development intervention. Epidemiologists claim that although worms are widespread and can cause illnesses sometimes, the evidence of important links to health is weak and knock-on effects of deworming to education seem implausible. . . .

So. Deworming: good for educational outcomes or not?

You’ll have to click through to read the details, but here’s Macartan’s quick summary:

The conclusions that I take away though are that (a) the magnitude and significance of spillover effects are in doubt because of the measurement issues and the inference issues; (b) the inferences on the main effects are also in doubt because of the problems with identification and explanation. Neither of the main claims is demonstrably incorrect, but there are good grounds to doubt both of them.

What about policy? Macartan continues:

A number of commentators have argued that the policy implications are more or less unchanged. This includes organizations that focus specifically on the evidence base for policy (such as CGD and GiveWell).

Perhaps the most important point of confusion is what policy conclusions this discussion could affect. Many are defending deworming for non-educational reasons. But the discussion of the MK [Miguel and Kremer] paper really only matters for the education motivation. And perhaps primarily for the short-term school attendance motivation. Like much other literature in this area it finds only weak evidence for direct health benefits (beyond the strong evidence for the removal of worms). It also does not claim to find evidence on actual performance. Although many groups endorse deworming for health reasons, and rank it as a top priority, this, curiously, goes against the weight of evidence as summarized in the Cochrane reports at least. If the consensus for deworming for health reasons still stands it is not because of this paper.

Does the challenge to this paper weaken the case for deworming for educational reasons? I find it hard to see how it cannot.

I have a few comments of my own, not on deworming—I know nothing about that—but on some of the statistical points raised by Macartan’s post.

– The 800-pound gorilla in the room is opportunity cost, or cost-benefit analysis. As you say, who could be against de-worming kids? I’m reminded of Jeff Sachs’s argument that all of these sorts of interventions are worth doing, and that rather than trying so hard to rank the cost-effectiveness of different health and economic interventions, the rich countries should just kick in that 1% of GDP or whatever and do all of them. I’m not saying Sachs is necessarily right on this, I’m just saying that most of the discussion seems to be on traditional statistical grounds (Is there an effect? Is it statistically significant? Has it been proven beyond a reasonable doubt?) and the cost-benefit or opportunity cost calculations are implicit. Once or twice, cost-benefit calculations do get done, but not in a serious way. For example, Macartan points to a “60 to 1” benefit-to-cost ratio for deworming claimed by the Copenhagen Consensus, but apparently those guys just took the point estimate of effectiveness (which is a biased estimate, possibly hugely biased; see more on this below) and ran with it.

– Macartan talks about multiple comparisons, which is fine (though I’d prefer hierarchical modeling rather than classical corrections; see here and here. Macartan mentions the statistical significance filter: Statistically significant estimates tend to overestimate the magnitude of true effects (we call it the type M error or exaggeration factor here). This can be a big deal, especially once things get to the decision stage.

– Macartan mentions development economist Paul Gertler. I’ve only encountered his work once, and it was a case where he hyped and exaggerated (unintentionally, I’m sure) an effect size. I contacted him about it and asked him if he was concerned about the statistical significance filter, and he did not reply. Apparently he was happy reporting an overestimate. It was an early-childhood intervention experiment in Jamaica. Again, who could object to helping poor kids?

– I share Macartan’s skepticism about the spillovers. One problem here is that researchers have an incentive to make a “discovery.” De-worming helps kids, ok, that’s fine. But a spillover effect, that’s news. But the paradox is that these surprising findings are more subject to the statistical significance filter. The headline clams can be the biggest overestimates. And this is completely consistent with the calculation in section 3.4.1 of Macartan’s report. It is similar to the calculation that Eric Loken and I did regarding the notorious claim that women in a certain part of their monthly cycle were more likely to wear red. The researchers were proud of making this discovery with such a noisy measuring instrument, but if you back out how large the effect would’ve had to be, for the claimed effect to show in the population, it would have to be unrealistically huge. And of course this happened with that horrible LaCour study—the claimed effects in the aggregate implied huge effects in the subgroup of the population who would’ve been affected by the treatment.

– I don’t like Macartan’s section 4.2, “Can we be a bit more Bayesian?” I guess I’d like him to be a bit more Bayesian. In particular, I really don’t like the sort of binary thinking in which deworming works or doesn’t work for some purpose. To me, the concern is not that deworming or whatever is a “dud” but rather that it is not as effective as the published record might suggest. For a Bayesian decision analysis I’d prefer to do it straight, with costs, benefits, and a continuous parameter that represents the effectiveness of the treatment. Even setting the decision analysis aside, you can do Bayesian inference: just say there’s a true (population, average) casual effect and that you have a prior for it. Then it’s simple inference, an inverse-variance weighted average of the data and the prior information, no need for tricky probability formulas.

Finally, I appreciate the way that, in his report, Macartan moves back and forth between the details and the big questions. These connections are a key part of any methodological analysis.


  1. Steve Sailer says:

    My vague impression is that the Rockefeller Foundation’s campaign against hookworm a century ago did the American South a lot of good, so that influences my prejudice in favor of deworming.

  2. Ruben says:

    “who could be against de-worming kids?”

    Maybe the advocates of Helminthic therapy:
    They find various positive effects of helminth treatment against auto-immune disorders. Basically the idea is that we’re adapted to carrying worms who down-regulate our immune systems.
    I think it’s not super established, but it makes sense to me and it also makes sense that it would have a hard time getting established.

    If, as you summarise it here, there are hardly any spillover effects to be found, only the worms are gone, maybe keep them and think of them as pets?

    • BenK says:

      Oops, I missed this before posting below.

      I wouldn’t say ‘down regulate’ – it’s more like ‘exercise, condition, develop.’
      There are very good studies demonstrating how a ‘germ-free’ life is a disaster,
      with serious developmental and metabolic failure. The technical term is
      gnotobiotic. Gnotobiotic mice are so abnormal that they are nearly useless for
      most kinds of research; instead, if simplification is required, ‘defined flora’
      animals are used (Altered Schaedler Flora). Unfortunately, this too is insufficient,
      as Yasmine Belkaid showed, really a defined skin flora would also be required,
      and down the rabbit hole one goes. It’s a bit hard to sell people on ‘good pathogens’
      but there is some inherent logic which makes sense.

      There are two main elements to the need for a normal microbiome; the normal
      microbiome is required to perform common functions; and the normal microbiome
      exerts pressures chronic, periodic, and occasional on the immune system. The immune
      system can’t be left in a vacuum, it’s a learning, adaptive, system. Bones don’t
      form and maintain properly in zero gravity. The nervous system wouldn’t develop well
      without any sensory inputs. Similarly, the immune system can handle only a certain
      spectrum of inputs and still develop normally.

      Anyway, I’m not quite sure how to intentionally modulate the worm infections, but
      something along those lines might be necessary; if we can ever sort out the research.
      If it’s any comfort, the Helicobacter story is even more complex, with subspecies
      co-adaptation, etc.

      • Keith O'Rourke says:

        I do recall some parasitic infections are considered curative for some auto-immune diseases – patients with diagnosis obtain the parasite (sometimes purposefully) and very quickly go into remission.

        But I don’t think there are any approved treatments as parasites may transmit to others or go rogue on the host.
        (Leeches and maggots are approved – just ask X about the first.)

        • Rahul says:

          Do you have a link about the curative aspects of parasitic infections? I always took the theory to be true in a general sense. I didn’t know they could be used as cures in actual patients suffering from the auto immune diseases.

        • I think there is some kind of roundworm treatment for irritable bowel syndrome. The roundworms are compromised in some way (like maybe irradiated to sterilize them or something) before being ingested. This prevents the re-transmission and the risk of long term infection. But I’m just going based on some vague recollection from a news article years back.

          • Ruben says:

            Sometimes they just use worms that aren’t specifically targeting humans like pig whipworms. Those, Trichuris suis ova, are an investigational medicinal product in the US. I guess that doesn’t mean “approved”, but still.

            @BenK: I guess the worms would call it down-regulation ;-) but yeah we might call it exercise or target practice or whatever.

            Here’s some more detail:
            Anyway, “Old Friends hypothesis” is a new favourite of mine.

            On the one hand a lot of people say the yuck factor would prevent adoption, but on the other hand you find a lot of people looking to buy whipworm eggs online. Beats IBD I guess.

  3. Economist says:

    There are many issues (some yet to be brought up) with the original paper. But regarding the 800 pound gorilla :

    See the comments to this Marginal Revolution post (including my comments) :

    One attractive feature of randomized trials (aside from the obvious one about causality) is that they are easy to understand and explain. That’s why they are popular in the aid industry. Once you start taking cost-benefit seriously, you are forced to make all kinds of assumptions and start modelling different things – and then you lose some of the appeal of using RCTs. Some examples of the assumptions needed in this context :

    i) How long do the benefits last ?
    ii) What is the discount rate of capital ?
    iii) How do you value school attendance (a million assumptions needed here) ?
    etc. etc.

    You might not be aware of the “cultural” context. The authors have had a huge influence on development economics and the aid industry. This paper was one of the early papers that started the movement towards RCTs. The authors’ students and co-authors are influential and affiliated with leading universities and journals. That is why the push-back from economists has been severe.

    But this is a can of worms that deserves to be opened.

  4. Noah Motion says:

    For a Bayesian decision analysis I’d prefer to do it straight, with costs, benefits, and a continuous parameter that represents the effectiveness of the treatment.

    Ultimately, doesn’t a dichotomous decision have to be made? In this case, a decision about whether or not to implement a de-worming program.

    Even setting the decision analysis aside, you can do Bayesian inference: just say there’s a true (population, average) casual [sic] effect and that you have a prior for it. Then it’s simple inference, an inverse-variance weighted average of the data and the prior information, no need for tricky probability formulas.

    What if the (point estimate of the) weighted average is small (relative to the error of the estimate)? Or what if the posterior distribution of the effect estimates is ~40% negative and ~60% positive? It’s not clear to me what “simple inference” is, in general. And, again, a dichotomous decision about whether or not to de-worm has to be made at some point, right?

    • Fernando says:

      Noah: “And, again, a dichotomous decision about whether or not to de-worm has to be made at some point, right?”

      Which reminds me why bayesians can be like two handed economists.

    • Phil says:

      Oh good, this is my excuse to mention a paper that I think is one of Andrew’s best, although I am biased for reasons that may become obvious: this decision analysis paper. Yes, you ultimately have to make a decision, but if you have several important parameters that are uncertain then the best decision can be quite different from what you’d do if you just use the best point estimate of each of the individual parameters.

    • Anoneuoid says:

      “Ultimately, doesn’t a dichotomous decision have to be made? In this case, a decision about whether or not to implement a de-worming program.”

      Shouldn’t the question instead be regarding how much to spend on a de-worming program, or what percent of the budget to spend on a de-worming program? It could be zero.

  5. BenK says:

    There is another gorilla in the room which hasn’t been mentioned or sufficiently studied; the helminth hypothesis. This ties the incidence of worm infection to immune development and chronic immune health, possibly including maternal effects on child immunity and immune development.
    This may have secondary effects related to metabolic health and disorder.

    The specter in the room, then, is that in the absence of malnutrition, aggressive chemical deworming may not have substantial immediate health and economic benefits, but may have very long term, chronic, multigenerational health costs.

    The subject is inherently difficult to study. There are numerous indirect investigations and discussions, but progress is slow. Still, if there is a gorilla under a lampshade in the corner, more so than alternative uses for charitable aid, causing direct harm to health would be it.

    Here are some recent references to get you into the issue, if you find it interesting.

    • Rahul says:

      The chemical deworming that this debate focuses around, is it one time or an ongoing program? If it is indeed one time, how long do the effects persist? In the absence of stellar hygiene won’t the worms re-infest?

      • jrc says:

        I think basically the pill kills all the worms in you when you take it, and then that’s it. So you can easily get re-infected, and that is sort of the point. But it is also not just a 0/1 type of thing: small worm loads in people aren’t usually a big problem, but if they multiply for a while and start to build up you can start seeing increasing problems from anemia, malnutrition, and other gut issues.

  6. Conor says:

    The replication paper was off. If anything, it made me even more confident in the original paper. There were problems with it (e.g. alphabetizing instead of truly randomizing), but it takes a surprisingly large amount of screwing around to make the results go away.

  7. Heather says:

    Thanks for the post. It would be great if, in a future post, you comment a little more about spillovers and the appropriateness of using spillover effects (including for cost-efficiency analysis) when promoting scale-up. This has always struck me as odd, unless one is promoting a scale-up program that takes account of those likely to receive spillovers and skips them. But perhaps I am missing something?

  8. Elin says:

    Measuring spillover effects is really difficult and needs to be part of the original design not an afterthought. Also the spillover mechanism needs to be actually tested and there are many potential mechanisms. For example, was it really a spatial autocorrelation where reducing infestation at a central place disrupted the normal spread of infestation/reinfestation, was there some kind of informational effect where people heard about the deworming in nearby schools and figured out how to get it for their children who were in other schools, was it even possibly a compensation situation in which school administrators did something else that helped because they thought it wasn’t fair that the neighboring schools got the treatment so they did something else like work on better sanitation. I could think of more. Experiments in the field especially when they take extended periods of time to carry out are really different from experiments in the lab. This is why they need to have strong process evaluation components where people are on the ground looking at implementation in practice and what is happening in nearby areas and why. In research on crime control strategies I can tell you that people think a lot about both “displacement” of crime and “diffusion of benefits” and neither is easy to understand without a lot of work. I had a doctoral student who studied prostitutes pushed out of Times Square and they ended up mainly: working out of hotels instead of on the street; working around the Javits Center; working at Hunts Point; and working in New Jersey. This was not easy to find out. Research involves shoe leather and time.

    My question is did they actually have this complex multi step spillover effect in mind when they designed the trial and if so what did they do to make sure they would be able to meaningfully test it?

Leave a Reply