Ethical and data-integrity problems in a study of mortality in Iraq

Posted on April 27, 2010 9:24 AM by Andrew

Michael Spagat notifies me that his article criticizing the 2006 study of Burnham, Lafta, Doocy and Roberts has just been published. The Burnham et al. paper (also called, to my irritation (see the last item here), “the Lancet survey”) used a cluster sample to estimate the number of deaths in Iraq in the three years following the 2003 invasion. In his newly-published paper, Spagat writes:

[The Spagat article] presents some evidence suggesting ethical violations to the survey’s respondents including endangerment, privacy breaches and violations in obtaining informed consent. Breaches of minimal disclosure standards examined include non-disclosure of the survey’s questionnaire, data-entry form, data matching anonymised interviewer identifications with households and sample design. The paper also presents some evidence relating to data fabrication and falsification, which falls into nine broad categories. This evidence suggests that this survey cannot be considered a reliable or valid contribution towards knowledge about the extent of mortality in Iraq since 2003.

There’s also this killer “editor’s note”:

The authors of the Lancet II Study were given the opportunity to reply to this article. No reply has been forthcoming.

Ouch.

Now on to the background:

More than six-and-a-half years have elapsed since the US-led invasion of Iraq in late March 2003. The human losses suffered by the Iraqi people during this period have been staggering. It is clear that there have been many tens of thousands of violent deaths in Iraq since the invasion. . . . The Iraq Family Health Survey Study Group (2008a), a recent survey published in the New England Journal of Medicine, estimated 151,000 violent deaths of Iraqi civilians and combatants from the beginning of the invasion until the middle of 2006. There have also been large numbers of serious injuries, kidnappings, displacements and other affronts to human security.

Burnham et al. (2006a), a widely cited household cluster survey, estimated that Iraq had suffered approximately 601,000 violent deaths, namely four times as many as the IFHS estimate, during almost precisely the same period as covered by the IFHS study. The L2 data are also discrepant from data provided by a range of other reliable sources, most of which are broadly consistent with one another. Nonetheless, there remains a widespread belief in some public and professional circles that the L2 estimate may be closer to reality than the IFHS estimate.

But Spagat says no; he suggests “the possibility of data fabrication and falsification.” Also some contradictory descriptions of sampling methods, which are interesting enough that I will copy them here (it’s from pages 11-12 of Spagat’s article):

The L2 authors [Burnham et al.] have often dismissed the possibility of sampling bias by stating that they did not actually follow the sampling procedures that they claimed to have followed in their Lancet publication. For example, Burnham and Roberts (2006a) write that they had removed the following sentence from their description of their sampling methodology at the suggestion of peer reviewers and the editorial staff at the Lancet:

As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have. (Burnham and Roberts, 2006a)

Thus [according to Spagat], this part of the description of sampling methodology should have read:

The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have. (Original text from Burnham et al., 2008, with new text italicised)

Combining this with Gilbert Burnham’s New Scientist interview already quoted (Biever, 2007) would imply that at each location:

(1) Field teams wrote names of main streets on pieces of paper and selected one street at random.

(2) The field teams then walked down this street writing down names of cross streets on pieces of paper and selected one of these at random.

(3) The field teams then became aware of all other streets in the area that did not cross the main avenues and may have selected one of these instead of one of the cross streets written on pieces of paper. This wide selection was done according to an undisclosed procedure.

The Biever (2007) description of Burnham does outline a sampling procedure that could have been followed and is broadly consistent with the published methodology. If other types of streets, beyond those that would be covered by the published methodology, were included in the sampling procedures then the authors need to specify how these streets were included. More fundamentally, how did the field teams discover the existence of such streets that could not be seen by walking down principal streets as described by Burnham in Biever (2007)?

The L2 field teams would not have brought detailed street maps with them into each selected area or else it would not have been necessary to walk down selected principal streets writing down names of surrounding streets on pieces of paper. We can also rule out the possibility that the teams completely canvassed entire neighbourhoods and built up detailed street maps from scratch in each location. Developing such detailed street maps would have been very time consuming and the L2 field teams had to follow an extremely compressed schedule that required them to perform 40 interviews in a day (Hicks, 2006).

In Giles (2007), an article in Nature, Burnham and Roberts suggested one possible explanation on how the field teams had managed to augment their street lists beyond streets that could be seen by walking down a main street, but this suggestion was rejected by an L2 field team member interviewed by Nature:

But again, details are unclear. Roberts and Gilbert Burnham, also at Johns Hopkins, say local people were asked to identify pockets of homes away from the centre; the Iraqi interviewer says the team never worked with locals on this issue. (Giles, 2007)

Even if locals had identified such ‘pockets of homes away from the centre’ the authors still would have to specify how these were included in the randomisation procedures. Indeed, involving local residents in selecting the streets to be sampled would seem to be at odds with the random selection of households. Locals could, for example, lead the survey teams to particularly violent areas.

Burnham and Roberts have induced further confusion about their sample design by issuing a series of contradictory statements.

The sites were selected entirely at random, so all households had an equal chance of being included. (Burnham et al., 2006b, emphasis added)

Our study team worked very hard to ensure that our sample households were selected at random. We set up rigorous guidelines and methods so that any street block within our chosen village had an equal chance of being selected. (Burnham and Roberts, 2006b, emphasis added)

. . . we had an equal chance of picking a main street as a back street. (The National Interest, 2006)

These statements contradict each other and the methodology published in the Lancet. Some streets are much longer than others. Some streets are much more densely populated than others. Such varied units cannot all have equal probability of selection. If, for example, every street block had an equal chance of selection then households on densely populated street blocks would have lower selection probabilities than households on a sparsely populated street block. If main streets are more densely populated on average than are back streets and main streets and back streets have equal selection probabilities then households on main streets would have lower selection probabilities than households on back streets.

Spagat has clearly done a lot of work here and I haven’t read his paper in detail, nor have I carefully studied the original articles by Burnham et al. Also, some of Spagat’s criticisms seem less convincing than others. When I saw the graph on page 16 (in which three points fall suspiciously close to a straight line, suggesting at the very least some Mendel’s-assistant-style anticipatory data adjustment), I wondered whether these were just three of the possible points that could be considered. Investigative blogger Tim Lambert made this point last year, and having seen Lambert’s post, I don’t see Spagat’s page 16 graph as being so convincing.

In any case, I’m happy to see a high-profile survey subjected to this sort of scrutiny. When I looked at the Burnham paper a few years ago, I search without success for details of their sampling and estimation procedures. But, as I wrote in response to Spagat in 2008, it’s surprisingly difficult for people to write exactly what they did (see also this discussion with Phil and others).

If Burnham et al. are giving contradictory descriptions of their sampling methods, this could be evidence of fraud, or evidence that they don’t fully understand cluster sampling (which actually is a complicated topic that lots of researchers have trouble with), or evidence that their sampling was a bit of a mess (which happens to the best of us) and that they didn’t do a great job explaining it. I hope the discussion surrounding Burnham, Spagat, etc., will push future survey researchers to describe their methods more explicitly.

Beyond all this, it can be difficult to get people to respond to a survey. Countries such as the U.S. are saturated with junk polls (recall my recent rant on the topic) to the extent that hanging up on all survey interviewers is probably the optimal strategy for most.

Blame the statistics teachers

To some extent, the culprits here are not just Burnham, Lafta, Doocy and Roberts, but also statistics education in general. In our introductory courses We do cover these topics in more detail in our classes on sample surveys, but in the statistics department that I’ve been involved in, very few students tend to take such classes, and many statistics faculty–even those who should really know better–are unaware of the practical and conceptual difficulties of sampling of human populations. A lot more interesting than asymptotic theory, in my opinion, but it occupies a pretty small place in the statistics curriculum. As a result, you have people going out in the field, just winging it, then struggling later to define and justify what they’ve done.

P.S. Les Roberts, one of the authors of the disputed study, apparently teaches at Columbia! I don’t recall ever having met him, though. Perhaps we (the Applied Statistics Center and Center for the Study of Development Strategies) can throw a miniconference on the topic of Statistical Sampling in Developing Countries and invite Roberts to participate.

21 thoughts on “Ethical and data-integrity problems in a study of mortality in Iraq”

Teemo on April 27, 2010 6:51 AM at 6:51 am said:

"War, famine, pestilence" — if you want a succinct and apt summary of the core threats to public health, these three Horsemen of the Apocalypse serve as well today as they did in the Old Testament.

This is a statistics-centered blog, obviously, and I imagine like most of its readers I read it to get some quick insights on statistical methods and critiques of same. (Along with some entertaining asides on various matters that I find interesting.) So who would complain about a lively discussion of sampling methods on a topic both timely and important? Not me.

But this post, though technically interesting, I find a bit unsettling. It sounds like it could be discussing some telephone survey of credit card users conducted by callers from a Nebraska prison.

Wars are unfathomable health disasters. Estimating the resulting deaths is (1) extraordinarily difficult, (2) extraordinarily valuable, and (3) extraordinarily susceptible to spin. And extraordinarily dangerous to data collectors operating in a war zone — such as the household interviewers in these surveys — and to the respondents.

While discussion of appropriate cluster variances and whatnot has its place, it seems to me that the context in which these surveys were undertaken should be a prominent part of any discussion. How do we make timely estimates of war-related deaths (or diseases or injuries) that are founded in some sort of statistical reality? It doesn't seem an exaggeration to say that this is one of the most important things that a public health analyst could undertake. And also an area in which repeated acknowledgement of the courage of the data collectors and respondents would not be amiss.

As a last comment, while I sympathize with the general distaste for phrases like "the Lancet study", I think the irritation expressed here is misplaced. Would that any time a war erupts, there are a so many journal articles on the health and mortality impacts that we need to refer to them by the authors' names to keep them straight. But when we are in a world where there is likely to be a "Pentagon estimate", a "Regime X" estimate, an "Insurgency Y" estimate, and an "NGO Z estimate" hot off the presses, having a "Lancet estimate" that keeps us mindful that perhaps science and statistics can and should play a role is not such a bad thing.
Andrew Gelman on April 27, 2010 6:54 AM at 6:54 am said:

Teemo: You make good points, many of which Spagat also makes in his article. In my blog here, I'm focusing on sampling issues because that's where I'm more knowledgeable. I'll leave it to others to focus more on the substantive concerns in this case.
Morgan on April 27, 2010 7:51 AM at 7:51 am said:

I was highly suspicious of this study from the start, for more general reasons than Spagat lays out.

We had (from "IBC" – Iraq Body Count) at the time very good estimates of how many people die in an "average" (I don't mean to trivialize these incidents) car bombing, air strike, or shooting incident, so it was possible to calculate roughly how many such incidents must have occurred to produce as many deaths as these researchers estimate.

To reconcile this study with IBC, we would have to believe that roughly 92% of fatal car bombings (more than 10,000 of them), 95% of fatal air strikes (likewise), and 98% of fatal shootings (hundreds of thousands of them) went unreported in the media, and hence uncounted by IBC.

The whole point of car bombs, for example, is to attract attention (a point Spagat makes, too). You can't easily hide an airstrike. Shootings might more plausible occur unreported, perhaps, but almost all of them?

Even given the disruption war brings, the extent of the implied under-reporting struck me as incredibly implausible.
jonathan on April 27, 2010 7:58 AM at 7:58 am said:

It's somehow heartening to me that this confuses you. It certainly confuses me. (By confusion, I mean lots of questions about this very important study that can't be answered.) I've downloaded interviews, etc.

As I went through this material and the IFHS estimate, I wondered a lot about:

1. casualty estimates when there are many displaced people.
2. casualty estimates in non-technical societies – Iraq being somewhere in the middle, but with the implication that unreliability follows that scale.
3. casualty estimates in local, popular culture that create their own reality
4. historical unreliability of such estimates.
5. the way tools of modernity can sometimes cut through and sometimes cover up unreliability. As in, it looks like a study and sometimes it is rigorous and sometimes it puts a gloss on points 1, 2 and 3.

One sad thing is that the studies have become a weird focus for moral and political discussion. It is somehow OK that the IFHS # is low because the other number is so high it implies barbaric violence – that the US is then responsible for. I shake my head at that, but then we in this country care next to nothing about civilian casualties we cause elsewhere.
Teemo on April 27, 2010 8:38 AM at 8:38 am said:

If I recall correctly (which may not be the case), the study in question did not aim at estimating the number of violent deaths (car bombs etc) but the overall death rate during the hostilities. This was then compared to pre-hostility death rates (which had their own calculation issues).

One would expect such a study to yield a higher estimate of "war-related" deaths than a comparison calculation of violent deaths, because it would include things like deaths from interruption of electricity to hospitals, inability to get to medical care in the midst of civil strife, inability to get drugs, etc.
joshd on April 27, 2010 9:01 AM at 9:01 am said:

Teemo, you recall incorrectly. The study estimated death rates by cause. It then made an estimate for violent deaths: 601 027 (426 369–793 663), and for excess deaths that you describe: 654 965 (392 979–942 636).

So the study made estimates for both (and note that higher statistical confidence is given to the violent deaths estimate). Most of the concerns about this study have centered around the violence estimate both because it has raised doubts like Morgan's above (which wouldn't apply to non-violent deaths, 'excess' or otherwise) and because violent deaths are almost the entirety of the study's excess deaths estimate anyway. There are only about 54,000 'excess' deaths from non-violent causes and the study itself says that the non-violent excess deaths themselves are not statistically significant (i.e., these could easily be equal to 0 anyway).
Aaron on April 27, 2010 9:12 AM at 9:12 am said:

If these sampling issues might end up having a significant impact on the final results, then aren't they highly substantive?
Sebastian on April 27, 2010 9:15 AM at 9:15 am said:

I've been a bit skeptical about some of what the Burnham team did/reported etc., too. But I think the fact that they are not replying to an article in a third rate (if even that) journal isn't really telling. I know impact counts are somewhat problematic, but still:
2008 Impact Factor: 0.352
Ranking: 172/209 (Economics)
– if you write a controversial paper it would seem like you have enough replying and defending to do so that you'd want to stick to defending yourself against critiques in more visible venues.
Daso on April 27, 2010 9:54 AM at 9:54 am said:

You are partially correct. The study did aim to estimate all "excess" deaths–deaths greater than the number that would have been expected given the prevailing mortality rates in Iraq prior to the war. The authors did, however, ask about the causes of death and were able to estimate the number of deaths attributable directly to violence.

We estimate that as of July, 2006, there have been 654, 965 (392 979–942 636) excess Iraqi deaths as a consequence of the war, which corresponds to 2·5% of the population in the study area. Of post-invasion deaths, 601 027 (426 369–793 663) were due to violence, the most
common cause being gunfire. (p. 1)

Violent deaths, therefore, accounted for 91.8% of all excess deaths.
Megan on April 27, 2010 1:28 PM at 1:28 pm said:

I am surprised to read Morgan from IBC refer to "very good estimates." I attended a conference on mortality estimation last year, and a fellow attendee, from IBC, expressed quite a bit of disdain for estimates and statistical inference. It is my understanding that IBC is a large convenience sample, not suitable for generating estimates (in the statistical sense).

While I think the IBC does important work documenting reported deaths, I must point out that summary data from press reports is not necessarily going to be comparable to survey-based estimates.

There are certainly many valid criticisms to be made about the Burnham et al study. But I think the best response to poorly executed and/or explained studies (in incredibly complex conflict situations, as Teemo points out) is not to run toward unrepresentative convenience data but rather to design and carry out better scientific studies (and then do a better job explaining them to the public).
jonathan on April 27, 2010 3:56 PM at 3:56 pm said:

I heard a study author describe the methodology and how the deaths were estimated and attributed to causes. I have not had time to read the linked article but the quote in the post above is somewhat disingenuous; it notes "street lengths" when that seemed to me to be a case of not fully explicating the methodology. I don't know if they did clusters right but there is a point at which we don't know how well we can trust the underlying data, no matter how the methods are described. To say this might be evidence of fraud seems to bespeak a bias of your own. The most I can say is I don't know.
David Kane on April 28, 2010 5:14 AM at 5:14 am said:

For those interested, I have been blogging about this topic for several years. Most relevant posts are probably a) About Burnham being sanctioned by Johns Hopkins and b) Showing that the Burnham et al results are 8 times higher than those estimated by (the much better) IFHS when the calculations are compared on an apples-to-apples basis.
Morgan on April 28, 2010 5:42 AM at 5:42 am said:

Megan:

I apologize for leaving the impression that I am affiliated with IBC. I am not. I meant that "we" as consumers of information had access to the estimates.

I certainly agree that IBC is a sort of convenience sample. I also agree that it almost certainly undercounts deaths in total.

But the estimates I was referring to are not of the total deaths, but rather of the *average* number of dead in an incident of a given kind. I don't see any reason to think that the distribution of deaths per incident (on which the estimate is based) would be particularly impacted by the sample type.

What's more, to whatever extent the average is impacted, I would think the more spectacular (i.e. more deadly) incidents would be *more* likely to make it in. In which case IBC would *overestimate* the average dead per incident, and we would have to believe that an even larger percentage of the car bombings, air strikes, and shootings have gone unreported to reconcile IBC and the Burnham et al. study.
anonymous on April 28, 2010 7:25 AM at 7:25 am said:

On the ranking of the journal and whether a non-response is informative: it is more disconcerting that many medical journals will not accept comments on papers, even if highly problematic. I would also be hesitant to dismiss arguments just because of the ranking of a journal.
Sebastian on April 28, 2010 8:09 AM at 8:09 am said:

anonymous: I actually agree with both things you say – medical journals have, as far as I know, very poor correction/reply mechanisms.
And the rank of a journal doesn't say anything a priori about the validity of the argument.
That's not what I claimed, though: I merely think that the rank of the journal does have an impact on how widely read it is and how much researchers will care about publishing a reply – and if the journal is sufficiently obscure I think it's a legitimate decision for a busy researcher not to bother.

This is no judgment on the actual arguments made by Spagat, some of which seem quite convincing to me. I just don't think there is a moral, ethical, or professional obligation for researchers to reply to every attack on their findings.
Patrick Ball on April 28, 2010 8:56 AM at 8:56 am said:

Morgan,

First, I want to be clear that I have no interest in defending the Burnham et al. estimates. The flaws in that study are now well known.

However, it seems to me that there is reason to think that IBC is systematically biased with respect to incident size. The probability of report of large events is very high (explosions and massacres create attention), whereas individual assassinations and disappearances (or kidnappings that become disappearances) are much less visible to the press. Therefore it is probable that IBC reports a smaller fraction of killing events of size 1 relative to events with many victims.

Event size bias is important because event size is likely correlated with perpetrator group (some groups want to create public displays, others commit selective killings), victim type (large events will kill randomly whoever is present, selective killings will be predominantly adult men), and obviously, method of killing (among other factors).

These problems notwithstanding, there have been several published studies of these factors using unadjusted counts from the IBC as the data source. There has not been anything like the critique of those studies that Burnham et al. have attracted, yet their data are, if anything, worse.

It seems to me that one of the most important functions this blog could serve for the community of study of political violence would be constantly to remind analysts that convenience samples are always problematic. In my experience (about 20 years collecting and analyzing human rights violations data), multiple convenience samples about the same context have almost never converged to the same statistical picture on key historical questions. Statisticians are in the best disciplinary position to help guide us to better inferential methods. We need your help.
joshd on April 28, 2010 11:13 AM at 11:13 am said:

Unlike Morgan, I am from IBC. So i thought i'd address some of these issues that have come up related to IBC. In my view, Morgan makes a reasonable point. I think the comments from Patrick Ball misses its point and makes a number of tenuous claims which I'll try to address.

Mr. Ball writes:

"I want to be clear that I have no interest in defending the Burnham et al. estimates. The flaws in that study are now well known."

If so, this is news to me, since Mr. Ball leapt to promote the Burnham et al. estimates when they appeared: http://www.accuracy.org/newsrelease.php?articleId…

The comments from Mr. Ball promoting the Burnham study – part of a promotional PR package put out for the study's release – purport to dismiss the huge disparity between that study and IBC, such as those Morgan references in terms of average incident size. There Ball makes a number of assertions about "media reporting" which I do not believe would withstand any scrutiny. These include reference to work on Guatemala and other places. The findings of this Guatemala work, and its tenuous (at best) applicability for judging something like IBC, is discussed in Spagat's paper in section 3.8.

"However, it seems to me that there is reason to think that IBC is systematically biased with respect to incident size."

One reason it might seem this way is that IBC has said this itself and discussed this issue in some detail years ago: http://www.iraqbodycount.org/analysis/qa/assessme…

This and the following pages discuss bias toward larger scale incidents and some conclusions that may be drawn from this. For that matter, this issue was also raised by IBC even earlier in a seminar attended by a member of Benetech: http://www.iraqbodycount.org/analysis/qa/ibc-in-c…

More generally, Ball overlooks Morgan's point that such a bias would make the issue even more problematic for the Burnham study with regard to the points Morgan raised. (IBC discussed its own early scepticism of the Burnham study similar to those raised by Morgan here: http://www.iraqbodycount.org/analysis/beyond/real…

One problem Mr. Ball suggests may arise from this is that "large events will kill randomly whoever is present, selective killings will be predominantly adult men". This would suggest that if the incident-size bias was large then adult men should be badly underrepresented in IBC relative to their true proportions. The same IBC document evaluates this issue here:

http://www.iraqbodycount.org/analysis/qa/assessme… and:

http://www.iraqbodycount.org/analysis/qa/assessme…

We find that proportions for adult male deaths in IBC are very similar to those measured in other studies.

One of the most troubling comments from Mr. Ball however is this one:

"These problems notwithstanding, there have been several published studies of these factors using unadjusted counts from the IBC as the data source. There has not been anything like the critique of those studies that Burnham et al. have attracted, yet their data are, if anything, worse."

Despite Ball's early endorsements and promotion of the Burnham study, perhaps he has changed his views since he now says, "the flaws in that study are now well known". First, if they are _now_ well known this will be no thanks to Mr. Ball or his colleagues at Benetech, whose only contribution I know of on this matter has been attempts to help promote the Burnham study and squelch the entirely warranted early scepticism of this study like those described by Morgan. Second, the above comment suggests he doesn't grasp what the flaws in the Burnham study really are. The flaws described in the Spagat article show without any doubt that these data are at strong odds with (not just IBC or other so-called "convenience samples" but with) other survey estimates, lack any reasonable transparency and are entirely unverifiable. They also show arguably, but without much reasonable doubt in my opinion, that data underlying the Burnham study are fraudulent, i.e, fake, not reality.

It is strange to me that, if Mr. Ball does really understand the flaws in this study, he would still assert that the IBC data, which is fully documented, verifiable and transparent (i.e., it is real, even if possibly containing a number of biases) are "if anything, worse" than data which is unverifiable, non-transparent and likely fraudulent – but is dressed up to resemble "science".
Steve Sailer on April 28, 2010 9:58 PM at 9:58 pm said:

On October 12, 2006, I blogged at length about this study, eventually concluding:

"The more I think about the mechanics of carrying out the survey on the street without getting killed, the more I suspect that the Iraqi interviewers didn't actually implement the purely random survey design that the American professors from MIT and Johns Hopkins dreamed up for them. It would be nuts to to let luck determine which streets you'd choose, as the report claims they did. You'd want to only go where you knew you'd be safe. Then you'd tell the Americans you did exactly what they told you to do."

I theorized:

"Maybe what happened is that the interviewers didn't actually go much door-to-door at random, but instead arrived in a neighborhood, put the word out, and then either had people who wanted to talk to them come see them or were invited to the homes of people who wanted to see them. That might account for the very high % of people with death certificates available.

"Or it could be that the interviewers got in contact ahead of time with neighborhood leaders to see if their presence would be welcome to reduce their chances of being killed. (That's not good random surveying hygiene, but are you going to blame them?) Then, in a neighborhood where the local big shot wanted their presence, he might have passed the word around to aggrieved families to get ready to tell their stories to the interviewers when they showed up. This could cause a bias upward in the number of deaths reported."

In conclusion:

"The overall point, however, is that nobody else appears to be doing this kind of study because it is so hideously dangerous, which ought to tell us something."

http://isteve.blogspot.com/2006/10/updated-depres…
Patrick Ball on April 29, 2010 2:16 PM at 2:16 pm said:

joshd suggests that I or my colleagues tried to squelch criticism of Burnham et al. While I was impressed by the initial presentation of the study (as cited by joshd), we have never discouraged criticism. To the contrary, my colleagues and I actively welcome and promote debate about quantitative analysis of human rights situations.

I think it is appropriate to be skeptical until there is good evidence, of studies or of critiques. Few studies in political violence or elsewhere have been subject to the scrutiny given to the Burnham et al. piece, and I think our understanding of the challenges of cluster surveys on conflict mortality has been significantly advanced by papers, blogs, and presentations done by David Kane, Jana Asher, Fritz Scheuren, and many others.

I'm glad to have learned more about Burnham et al. specifically, and cluster studies in general for measuring conflict mortality, from the debate. This is how science advances. On joshd's note that I'm essentially agreeing with Morgan's point, sure.

Considering event size bias with respect to victim age, the reference joshd provides (slide 13) shows some agreement between IBC and other samples, but the other samples are very problematic. One is Burnham et al., which we've agreed has issues. The other survey is the 2004 UNDP study, which was collected too early to be appropriately comparable.

The IBC and Ministry of Health (MoH) studies are likely subject to a shared bias, perhaps a version of what Spagat calls "main street bias," different but related to what I've called "social visibility." Both IBC and MoH report what comes to their attention, and they don't know what doesn't come to their attention. The observation that IBC and MoH show similar age patterns may simply mean that indeed they share a selection bias that leads to the same age distribution.

My point about the data in IBC, or MoH, is that without a probabilistic basis for the data (or a model that leads to such a basis), we have no meaningful idea what they're saying. We can speculate about the existence or non-existence of bias, but we can't measure it.

Let me apologize for calling IBC data worse than other data. I want to welcome all data. The problem is not the data, the problem is over-interpretation such as I believe has been done with the IBC data. In my opinion, the fact that we cannot evaluate their representativeness makes the IBC data less useful for quantitative purposes than data collected in even a dubious survey. I say that having spent most of my adult life building convenience samples of human rights violations.

With respect to the Burnham et al. cite to my Guatemala work: what I claim (in the reference and elsewhere) about the relationship of press data to reality is that in many contexts, we've found that press data show a highly biased subset of the universe of events. I was asked why the Burnham et al. estimates might be different from IBC, and my reply is what joshd cited. I was not then nor later promoting Burnham et al., nor would I promote any study except one my colleagues and I do.

Spagat's note about Guatemala (cited by joshd) simply moves the speculation around: in the early 1980s, some international media covered a few massacres that went uncovered by the Guatemalan press, but adding international press to a press-based analysis would still provide an utterly inadequate quantitative account. There's no reason that the higher level of press coverage in Iraq in 2003-present (relative to Guatemala in the 1980s) should make press coverage of Iraq immune to the biases I suggested in my previous post.

joshd notes that IBC has raised event size bias in their presentations, including one I attended in October. Indeed, so why not do a test of the probability of press report of an event as a function of event size to evaluate the questions I'm raising? This would require an inferential method, and it would resolve this concern empirically. I appreciate that IBC has published their data so this analysis could be done by anyone.

Despite IBC's claim that press coverage of events in Iraq is different than any previous conflict, the only way to test the coverage claim is with a good inferential analysis. It's very helpful that IBC is explicit about its sources and coding, and all scientific efforts should strive to be transparent. Transparency isn't the problem I'm raising, however. The problem is selection bias.
Morgan on April 30, 2010 7:01 AM at 7:01 am said:

"In my opinion, the fact that we cannot evaluate their representativeness makes the IBC data less useful for quantitative purposes than data collected in even a dubious survey."

I disagree with this. The vast discrepancy between the "dubious" Burnham study and others (including IBC), and the absurdities that discrepancy implies, make me think that we can place no confidence in the Burnham results at all.

On the other hand, IBC is not as impossible to evaluate as you imply. It is a well-conceived count of reported deaths. I get what that means. I understand that the "reported" qualifier means that some events – the ones that go unreported – aren't included. Even if I can't directly quantify the impact of that, I do have a sense of how the process works and what factors are involved in determining whether a death is reported, and so I have a sense of how far off the estimates are likely to be. "Factor of 20" for car bombings and air strikes, "factor of 50" for shootings… they're just not within the range of plausibility.

So when Burnham says "factor of 20…50" based on standard survey methodology, I conclude that either the methodology described was not actually used, or the methodology failed catastrophically for unknown reasons. In either case, I don't know what the problem was, so I have no basis for estimating its impact (except reference to other estimates).

IBC is flawed, but in comprehensible and more-or-less estimable ways. Whatever flaws made Burnham so wildly far from the mark are far less well understood, even after Spagat's work. Or at least, that's my evaluation.
Megan Price on April 30, 2010 7:28 PM at 7:28 pm said:

Morgan describes IBC as “…flawed, but in comprehensible and more-or-less estimable ways.” Yes, estimable using inferential analysis. I very much look forward to the particular analysis suggested by Dr. Ball above – “..a test of the probability of press report of an event as a function of event size.”

Morgan also states “I do have a sense of how the process works and what factors are involved in determining whether a death is reported, and so I have a sense of how far off the estimates are likely to be.” Unfortunately, your sense is not statistically quantifiable. I cannot reproduce it on my own, nor can I assign an uncertainty measure to it.

It is good to have a well-conceived, well-documented data collection process. But that gets at transparency (which all good science should strive to achieve) not selection bias. We can make educated guesses about the cause and direction of such biases, but that’s all they are, guesses. They are not statistically defensible conclusions.

Comments are closed.