Statistical methodology

1: How were "excess deaths" calculated?

Summary: "Excess deaths" means fatalities that would not have occurred if the invasion had not taken place.

Method for calculating the number of excess deaths

The number of excess deaths was calculated by the following method:

  1. The pre-invasion crude mortality rate was estimated, giving a number of 5.5 deaths per thousand people per year.

  2. The mortality rate in the period since the invasion was calculated, giving a number of 13.2 deaths per thousand people per year.

  3. The excess crude mortality rate then was calculated as the difference between the post-war and pre-war mortality rates, i.e. 7.7 deaths per thousand people per year

  4. The total number of deaths was calculated by multiplying the excess mortality rate of by the time period of the study (just under 40 weeks, or 3.28 years) and the population in the relevant survey area (26,112 thousand).

The number of excess deaths thus is a comparison of the number of deaths that would have occurred if the pre-war mortality rate of 5.5 deaths/1,000/year had continued to prevail, and the observed increase to 13.2 deaths/1,000/year had not taken place.

The precise calculation is: (13.2-5.5) * 3.28 * 26,112 = 659 thousand. (The number 655 thousand is published in the report, and corresponds to excess mortality of 7.64, which rounds to 7.7).

Assumptions underlying the calculation

The above calculation relies on two key assumptions:

The first assumption is that the estimated pre-war crude mortality rate of 5.5 was correct. If the true number is higher (lower), the study would over (under)-estimate the number of excess deaths. 5.5/1,000 is similar to other recent estimates of Iraqi mortality and data for other countries in the region countries (more details).

The second assumption is that the pre-war death rate would not have changed in the absence of the war. There are a number of reasons that mortality rates could have changed in the absence of the war, including:

  • Changes to the population structure (other things being equal, older populations generally have higher mortality crude rates than younger ones).
  • Changes to living conditions in Iraq. These could have changed (either improved or deteriorated) also in the absence of the war and its aftermath.

As the scenario is counter-factual, and reliable detailed data on trends are not available, it is difficult to make any assumptions about the likely developments if the war had not taken place. (However, the period 2003-2006 is too short for significant changes to the population structure to take place.)

In the absence of information to the contrary, the study assumes that the mortality rate would have remained constant in the absence of the war and its aftermath. The correct interpretation of the study's finding therefore is that 655 thousand additional deaths have occurred in Iraq relatively to the case where the mortality rate had remained the same.

2: The study relies on extrapolation from a sample. Is this a valid method of estimating the total number of deaths?

One cricitism of the study and its predecessor has been that it relies on using a sample rather than surveying the whole population. The UK Prime Minister's Official Spokesman (PMOS) seemed to be implying this is not a valid methodology when he said.)

"The problem with this [study] was that they were using an extrapolation technique, from a relatively small sample ... We had questioned that technique from the beginning and we continued to do so."

Similarly, in response to the previous Lancet study, the PMOS said "Consequently, we did not believe that extrapolation was an appropriate technique to use."

(The PMOS also contended that the sample was not representative, and we discuss that issue here)

The Iraq Body Count also appears to question that sampling can establish anything about the characteristics of the population:

"All that has been firmly documented as a result of the Lancet study is that some 300 post-invasion violent deaths occurred among the members of the households interviewed."

Extrapolation from samples is standard scientific methdology.

These criticisms are somewhat puzzling as there is nothing unusual in the study's use of "extrapolation techniques". This method is used by all surveys except censuses. As it typically is not possible to survey every member of a country's population, it is standard methodology to construct a random sample, and use the statistical techniques to calculate what the findings in the sample imply for the characteristics of the population as a whole. Such "extrapolation" from samples is the cornerstone of market research, opinion polls, and much scientific research. It is a universally recognised method that companies, governments and academics use to obtain data.

Example: opinion polls.

A familiar case of sampling and extrapolation is opinion polls, in which a small number (typically just over 1,000) of randomly selected respondents are asked about their opinions. For example, if 400 people out of a random sample of 1,000 indicated that they supported a particular political party, then it would be a valid inference that 40 percent of the population as a whole supported this party. (This is based on a fundamental result in statistics that says that the average of a random sample is a valid estimate of the true population average, regardless of the characteristics [distribution] of the underlying population from which the sample was constructed.)

A small sample does not "bias" the results

The technique relies on the important premise that that the sample is representative of the total population being surveyed. This only is true if each member of the population (regardless of its characteristics) is equally likely to be included in the sample. If this is not the case the results would be "biased", i.e., systematically under- or over-estimate the true value in the population as a whole. By contrast, if the sample is truly random, a small sample does not cause bias. For example, while an opinion poll of just 100 people would give highly uncertain estimates, there is no way of knowing whether it over- or under-estimates a particular finding. Both are equally likely.

A larger sample size gives more precise estimates.

A larger sample size is more informative as it gives more precise estimates. The degree of precision depends on the method of sampling and can be estimated using statistical theory. The typical way to report this is in the form of a "95% confidence interval", i.e., lower and upper bounds which are 95% certain to encompass the "true" value in the population. This does not mean that any number in that range is equally likely to be the true value. The true value always is significantly more likely to be close to the middle of the interval than either of the extremes.

For more information about the application of the above concepts in the Lancet study see:

3: By surveying "clustered" households, did the study risk over-estimating the number of deaths?

Summary: Cluster sampling is a standard method for surveying unstable areas. It leads to less precision but not to any other systematic effects on the estimates. It does not "bias" the results.

Background on cluster sampling

This statistics glossary has a good definition of cluster sampling:

"Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters are selected. All observations in the selected clusters are included in the sample.

Cluster sampling is typically used when the researcher cannot get a complete list of the members of a population they wish to study but can get a complete list of groups or 'clusters' of the population. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, people who live in different postal districts in the U.K."

Use of cluster sampling in the Lancet study

These considerations apply in Iraq, where no reliable recent census exists and where insecurity means that travelling is very dangerous. The researchers sampled 47 clusters across Iraq, each with 40 households, and surveyed a total of 1,849 households. This means that, at 47 locations around the country, around 40 households close to each other were interviewed.

Effects of cluster sampling on estimates

Cluster sampling affects the estimate as there is a possibility that households in a particular cluster may have been affected by the same local specific circumstances (e.g., a car bomb, or unusual distance from a violent area). As the experiences of members of the same cluster may be correlated, the number of "independent" observations is reduced. In effect, cluster sampling means that the "effective" sample size is reduced

The effects of small sample size are discussed in this FAQ. In brief, a smaller sample reduces the precision of the estimate. While it is not known beforehand whether the experiences of households in the same cluster are correlated, the degree of correlation can be observed once the survey has been carried out, and from this information the effect on the precision of the estimates calculated. This was done in the Lancet study, and the study found the number of excess deaths to be in the range 323-943 thousand with 95% certainty, and more likely to be near 655 thosand than either of these extremes.

By contrast, there is no reason to believe that clusters in different locations would have been affected in the same systematic way. While one cluster may have unusually high mortality because of common experiences, another may have unusually low mortality. As as long as both the selection of clusters and households within each cluster is random, the sampling technique therefore does not "bias" (systematically over or under-estimate) the results.

4: Did the survey collect demographic data?

Some commentators have suggested that the 2006 MIT/Bloomberg study results are undermined because the survey did not collect sufficient demographic information. For example, Steven Moore wrote in a Wall Street Journal Op-ed

"Without demographic information to assure a representative sample, there is no way anyone can prove--or disprove--that the Johns Hopkins estimate of Iraqi civilian deaths is accurate. "

Surveys often ask for demographic information about respondents so that the sample can be compared against a known source of information for the population as a whole. In a truly random sample, one would expect respondents should have a similar demographic profile as the population as a whole. Where reliable information about the population as a whole is available (typically, from a recent census), the results sometime can be "adjusted" to address any discrepancies between the sample and the population as a whole (although this procedure also introduces a risk of introducing bias to the results).

In the case of the 2006 MIT/Bloomberg study, a number of issues arise:

First, it is not correct that the absence of demographic data automatically would make the sample unrepresentative. The criterion for a representative sample is that it is "random", i.e., that all household have an equal chance of being included in the sample. If this criterion is fulfilled, the sample will be representative regardless of auxiliarly demographic information collected.

Second, the MIT/Bloomberg study did in fact collect demographic information. This included information about the age and sex of decedents. It also gathered data on number of household memgers as well as the sex of household members (it is not clear from the published article whether information about age was collected), as well as the number of births in the study period, and the extent of in- and out-migration of household members.

These data were compared against other known variables, for example:

  • 49% of household members in the sample were male, which is similar to the population as a whole.
  • The pattern of deaths among women followed a J-shaped demographic curve: high mortality among children, lower in the adulthood, and rising sharply among elderly people).
  • The mean household size was 6.9 people.

Third, the usefulness of demographic data requires an accurate point of comparison. This is not readily available for Iraq, where the last census was conducted in 1997 and did not cover all of the country. In the intervening period Iraq experienced significant emigration as well as exposure to economic sanctions and subsequently war that could have substantial impact on the demographic profile of the population. In this situation, collecting demographic data is of limited usefulness, and adjusting the result to "correct" for deviations from census information would risk introducing biases.

Fourth, unlike most surveys, the unit for each interview was a household rather than an individual, which also makes comparisons to demographic characteristics more difficult. As one commenter points out,

"So, for example, if 70% of the people interviewed were women but women only comprise 55% of the adult population, you still have no evidence of bias -- perhaps women are more likely to be at home. So you would need to get demographics on the entire household."

Fifth, the survey was carried out in very difficult circumstances and the researchers faced a trade-off between the number of interviews that could be carried out and the detail and time required for each interview. In these circumstances, the researchers appear to have made the judgement that increasing the sample size by focusing on the key question at hand -- viz., the number of deaths in the household -- would provide more reliable estimate than additional demographic information.

5: Who else uses the methodology used by the Lancet study?

The cluster survey methodology used for the Lancet study is the standard technique for measuring mortality in conflict situations. (We discuss cluster sampling more generally in this FAQ.)

A prominent example of the use of cluster sampling to measure mortality and other humanitarian indicators is the Standard Monitoring and Assessment of Relief and Transitions (SMART) program. This project aims to develop a standard methodology to assess humanitarian crises, and uses cluster surveys (see the SMART methodology document ). SMART is funded by the Canadian International Development Agency, and the development work coordinated by the United Nations Children's Fund and United States Agency for International Development (USAID).

A key point of the SMART project is that measuring the crude death/mortality rate often is a good indicator of the severity of a humanitrian crises:

The SMART methodology is based on Crude Death Rate (CDR) and Nutritional Status of Children Under-Five. These are the most vital, basic public health indicators of the severity of a humanitarian crisis. They monitor the extent to which the relief system is meeting the needs of the population and the overall impact and performance of the humanitarian response. link

The use of cluster sampling to measure mortality in difficult conditions was discussed at a a recent conference at the London Schol of Hygiene and Tropical Medicine. This included a number of papers discussing the use of cluster surveys to measure humanitarian indicators, including mortality rates.

Similarly, a recent bulletin from the World Health Organisation discusses the use of cluster sampling to measure crude mortality rates. In surveys of mortality in Darfur, Sudan, it found that the cluster method gave similar results to other methods.

6: Why not simply count death certificates issued?

Summary: The death certificate record in Iraq is incomplete and not centrally collated, and such central data as exists are not readily available. Counting all death certificates is not logistically feasible, and even if it were it would produce a biased estimate of the number of deaths.

Counting death certificates to estimate mortality would be a valid approach if

  • a) all deaths were recorded by a death certificate,
  • b) no certificates were issued for deaths that did not occur, and
  • c) all death certifictes could be obtained.

These conditions do not necessarily obtain in all countries even in peacetime, but they are particularly unlikely to hold in unstable countries like Iraq with weak administrative capacity and poor security. For this reason, estimates of mortality in conflict situations typically rely on surveying people directly rather than attempting to use likely incomplete official records. As discussed in this FAQ, the methodology used by the MIT/Bloomberg study is the standard approach to measuring mortality in insecure or unstable conditions.

In Iraq, the study's findings suggest that hospitals have continued to issue death certificates for a high proportion of deaths. This is likely to follow on from a tradition in which death certificates have been required for insurance and compensation claims, the payment of benefits, and for burial (see this FAQ). The interviewers carrying out the survey for the Lancet study were shown death certificates in 92 percent of the cases where they were asked for (and 80% of all deaths). This implies that a large number of death certificates have in fact been issued locally.

However, there is no reliable information indicating that locally issued death certificates are collected in central records. The capacity of the Iraq state was weak prior to the invasion and the war is likely to have affected (already weak) official functions to gather demographic statistics. Without reliable information to the contrary, any method that relied on the completeness of official records and administrative procuedures therefore would risk underestimating mortality.

The risk of under-reporting using health-facilities, morgues, or similar methods is well-documented in previous studies. The authors note this in the article in the Lancet:

"Our estimate of excess deaths is far higher than those reported in Iraq through passive surveillance measures. This discrepancy is not unexpected. Data from passive surveillance are rarely complete, even in stable circumstances, and are even less complete during conflict, when access is restricted and fatal events could be intentionally hidden. Aside from Bosnia, we can find no conflict situation where passive surveillance recorded more than 20% of the deaths measured by population-based methods. In several outbreaks, disease and death recorded by facility-based methods underestimated events by a factor of ten or more when compared with population-based methods. In several outbreaks, disease and death recorded by facility-based methods underestimated events by a factor of ten or more when compared with population-based estimates."

Reliance on official stastistics is further complicated in Iraq by political considerations. Information about deaths is highly politically sensitive and may be difficult to obtain. One indication of this are reports that Iraqi Prime Minister Nuri al-Maliki ordered the Health Ministry in September 2006 not to disclose more figures about death rates in Iraq to the United Nations (see NY Times reporting.