Tex and Jimmy sent me links to this study by Gilbert Burnham, Riyadh Lafta, Shannon Doocy, and Les Roberts estimating the death rate in Iraq in recent years. (See also here and here for other versions of the report). Here’s the quick summary:
Between May and July, 2006, we did a national cross-sectional cluster sample survey of mortality in Iraq. 50 clusters were randomly selected from 16 Governorates, with every cluster consisting of 40 households. Information on deaths from these households was gathered.
Three misattributed clusters were excluded from the final analysis; data from 1849 households that contained 12 801 individuals in 47 clusters was gathered. 1474 births and 629 deaths were reported during the observation period. Pre-invasion mortality rates were 5·5 per 1000 people per year (95% CI 4·3–7·1), compared with 13·3 per 1000 people per year (10·9–16·1) in the 40 months post-invasion. We estimate that as of July, 2006, there have been 654 965 (392 979–942 636) excess Iraqi deaths as a consequence of the war, which corresponds to 2·5% of the population in the study area. Of post-invasion deaths, 601 027 (426 369–793 663) were due to violence, the most common cause being gunfire.
And here’s the key graph:
Well, they should really round these numbers to the nearest 50,000 or so, But that’s not my point here. I wanted to bring up some issues related to survey sampling (a topic that is on my mind since I’m teaching it this semester):
The sampling is done by clusters. Given this, the basic method of analysis is to summarze each cluster by the number of people and the number of deaths (for each time period) and then treat the clusters as the units of analysis. The article says they use “robust variance estimation that took into account the correlation,” but it’s really simpler than that. Basically, the clusters are the units. With that in mind, I would’ve liked to have seen the data for the 50 clusters. Strictly speaking, this isn’t necessary, but it would’ve fit in easily enough in the paper (or, certainly, in the technical report) and that would make it easy to replicate that part of the analysis.
I couldn’t find in the paper the method that was used to extrapolate to the general population, but I assume it was ratio estimation (reporting deaths from 629/12801 = 4.9%, and if you then subtract the deaths before the invasion, and multiply by 12/42 (since they’re counting 42 months after the invasion), I guess you get the 1.3% reported in the abstract). For pedagical purposes alone, I would’ve liked to see this mentioned as a ratio esitmate, (especially since this information goes into the standard error).
Inicidentally, the sampling procedure gives an estimate of the probability that each household in the sample is selected, and from this we should be able to get an estimate of the total popilation and total #births, and compare to other sources.
I also saw a concern that they would oversample large households, but I don’t see why that would happen from the study design; also, the ratio estimation should fix any such problem, at least to first order. The low nonresponse numbers are encouraging if they are to be believed.
It’s all over but the attributin’
On an unrelated note, I think it’s funny for people to refer to this as the “Lancet study” (see, for example, here for some discussion and links). Yes, the study is in a top journal, and that means it passed a referee process, but it’s the authors of the paper (Burnham et al.) who are responsible for it. Let’s just say that I woldn’t want my own research referred to as the “JASA study on toxivology” or the “Bayesian Analysis report on prior distributions” or the “AJPS study on incumbency advantage” or whatever.