Shiga toxin-producing Escherichia coli (STEC) O157 is estimated to cause 96 534 illnesses each year in the USA; domestically acquired, foodborne infections comprise 63 000 of these, with 2000 hospitalizations and 20 deaths annually [Reference Scallan1]. Estimating the number of illnesses caused by pathogens commonly transmitted by food is an important step in determining public health priorities. Assessing the proportions of these illnesses due to specific exposures contributes to the development of targeted disease prevention strategies and helps to measure progress towards food safety goals.
Estimating the proportion of illnesses that can be attributed to contaminated foods is complicated by the fact that foodborne pathogens can also be transmitted through a variety of other exposure pathways, such as direct contact with animals or with other infected persons, and by exposure to contaminated water. Outbreak investigations represent a unique opportunity to directly attribute illnesses to specific exposure sources. Many sources of STEC O157 infection have been identified in outbreak investigations, including ground beef, lettuce, juice, sprouts, spinach, and contact with animal manure or contaminated water [2–Reference Yoder4]. Information from outbreak surveillance was used to estimate that 68% of domestically acquired STEC O157 infections are foodborne [Reference Scallan1]. However, <30% of STEC O157 infections are associated with recognized outbreaks [5, 6]. To determine sources of STEC O157 infection in cases that are not epidemiologically linked to outbreaks (i.e. sporadic infections), FoodNet has conducted case-control studies; these studies have been used to calculate the population attributable fractions (PAFs) associated with specific exposures [Reference Kassenborg7, Reference Voetsch8].
The use of attributable fractions as estimated by case-control studies to determine the proportions of infections that could be prevented if specific exposures were removed is well described [Reference Benichou9]. The FoodNet surveillance population is particularly well suited for calculating PAFs because active laboratory surveillance to identify incident infections supports the assumption that the source population accurately represents the target FoodNet catchment population [Reference Bruzzi10, Reference Greenland and Robins11]. Therefore, the FoodNet surveillance system provides a unique opportunity to attribute both outbreak-associated and sporadic infections to specific exposure sources and to develop a more complete picture of attribution within a population. This study uses the results of two case-control studies of sporadic STEC O157 infections conducted in FoodNet in 1996 and 1999 and the outbreak investigations that occurred during the two study periods to attribute infections to specific exposure sources.
FoodNet is a collaborative programme among CDC, 10 state health departments, the U.S. Department of Agriculture's Food Safety and Inspection Service, and the Food and Drug Administration. It has conducted active, population-based surveillance for laboratory-confirmed cases of infection caused by STEC O157 since 1996, and is part of CDC's Emerging Infections Program. FoodNet personnel regularly contact clinical laboratories to ascertain laboratory-confirmed cases of infection occurring within the surveillance sites [Reference Scallan12]. In 1996 and 1999, FoodNet sites conducted year-long case-control studies of sporadic STEC O157 infections [Reference Kassenborg7, Reference Voetsch8]. Data were obtained from these studies and from outbreak investigations conducted in FoodNet sites during these study years.
Outbreaks are defined as incidents in which two or more persons experience a similar illness resulting from exposure to a common source. Personnel detect outbreaks in FoodNet sites in a variety of ways, most commonly by reports from private citizens and medical professionals, and by reports of notifiable diseases from clinical laboratories [Reference Murphree13]. Investigation of outbreaks often includes active case-finding and the collection of stool specimens for identification of the aetiological agent. Outbreak cases can be defined by a common exposure setting even when a specific contaminated source cannot be implicated by outbreak investigators. Summary information regarding the exposure setting and source of infections for reported outbreak cases was obtained directly from the FoodNet sites for this study. Only laboratory-confirmed cases were included in the analysis.
Case-control studies for both years used the same patient and control enrolment methodologies and criteria for eligibility, exclusion, and matching [Reference Voetsch8]. All laboratory-confirmed STEC O157 infections were ascertained in the FoodNet catchment populations through active surveillance, and cases were interviewed within 21 days of their stool sample collection date about exposures in the 5 days [Reference Kassenborg7] or 7 days [Reference Voetsch8] before illness onset. Sequential digit dialling and birth registries were used to ascertain age-matched controls. Age- and telephone area code-matched controls were interviewed within 7 days of case-patient interviews. Potential controls were excluded if they reported having diarrhoea within 28 days prior to the case-patient's illness onset. In both case-control studies, infections associated with confirmed outbreaks (i.e. infections that were epidemiologically linked to exposure sources) were excluded and described separately [Reference Kassenborg7, Reference Voetsch8]. During both studies, interviews with cases and controls included questions about travel, child daycare, exposure to farms and farm animals, use of appropriate food handling practices, exposure to untreated water sources, settings of food consumption, and consumption of meats (including different cuts of beef), fruits, and vegetables.
For each study year, sporadic STEC O157 infections were attributed only to those exposures that were significantly and positively associated with infection in the final multivariable model in the published paper [Reference Kassenborg7, Reference Voetsch8]. When the exposure variable assessed did not specify a contaminated source, the most likely source associated with that exposure was assumed (e.g. contaminated food for the exposure ‘ate at a table-service restaurant’, and animal contact for the exposure ‘living on, working, or visiting a cattle farm’). Sporadic cases were not attributed to exposures that were associated with susceptibility to infection (e.g. use of immunosuppressive medication). For each outbreak, all associated infections were attributed to the specific source implicated by outbreak investigators. Three cases were identified to be outbreak-related, but a common exposure source was not reported so these were excluded from analysis.
Both case-control studies were conducted in dynamic populations, and controls were enrolled simultaneously with cases so they were at risk of infection during the same time as cases. Consequently, the calculated odds ratios are considered adequate estimators of the rate ratios or relative risks in the target population [Reference Knol14], and these were used to calculate the attributable fraction for each exposure source using the following formula [Reference Benichou9–Reference Greenland and Robins11]:
where the PAF (AFpop) for a specific exposure is a function of the proportion of cases exposed (pdexp) to a specific source and the adjusted odds ratio for that exposure (ORexp). The number of sporadic infections in the FoodNet study population attributable to each exposure during the study year was calculated by the following formula:
where the number of sporadic cases attributable to a specific exposure (ANexp) is determined by the AFpop and the total number of sporadic cases included in the case-control study [Reference Benichou9–Reference Greenland and Robins11]. After calculating the number of sporadic infections attributable to each exposure source, the number of unattributed infections was determined by subtracting the sum of all attributed cases from the total number of cases included in the case-control study. Attribution of sporadic infections not included in the case-control studies was done by applying the same AFpop to the total number of excluded sporadic cases.
Sources of uncertainty associated with the attribution method
To generalize the results of estimated attributable fractions to the target population (the FoodNet surveillance population), two important assumptions were needed; the validity of these are uncertain. First, the source population of laboratory-confirmed STEC O157 infections was assumed to be representative of the target population (FoodNet catchment) so that the attributed sources of infection in the source population of cases reflected those in the target population [Reference Greenland and Robins11]. For this to be valid, it was assumed that there were no underlying differences in the distribution of causal transmission pathways in the three subpopulations of STEC O157 cases: (1) sporadic cases enrolled in each case-control study, (2) sporadic cases not enrolled in the case-control studies, and (3) outbreak-associated cases. It was also assumed that all laboratory-confirmed sporadic and outbreak-associated infections occurring in the surveillance population during the two study years were ascertained (i.e. sampling fraction = 1). In addition, all outbreak-associated illnesses were assumed to be due to the sources implicated in the outbreak investigations. Likewise, it was assumed that all ascertained infections were correctly classified as sporadic or outbreak-associated such that the distribution of exposures in the source population of sporadic cases were not influenced by unrecognized outbreaks in cases included in the case-control studies.
Second, the exposure risks identified in the study population and used to calculate the attributable fractions were assumed to reflect the effect of exposure in the target population [Reference Greenland and Robins11]. This means that survey questions and categories of exposure used in both the case-control studies and outbreak investigations reflected relevant exposure definitions of infection risk [Reference Levine15, Reference Greenland16]. Likewise, it was assumed that confounding was adjusted for in the multivariable models [Reference Bruzzi10, Reference Greenland and Robins11, Reference Rockhill, Newman and Weinberg17], and that there was no interaction between exposure factors [Reference Benichou9, Reference Bruzzi10]. It is important to note that the sum of ANexp in a given study year could exceed the total number of sporadic cases ascertained because a case could be counted more than once if exposed to more than one significant exposure [Reference Bruzzi10, Reference Steenland and Armstrong18].
FoodNet sites ascertained 396 and 531 STEC O157 cases in 1996 and 1999, respectively. One hundred and five cases were associated with seven outbreaks in 1996. Seventy-nine cases were associated with four outbreaks in 1999. Table 1 provides the attributable numbers for the 740 sporadic cases and 184 outbreak-associated cases ascertained in the two study years. Of the 479 cases enrolled in the two case-control studies (68% and 63% of reported sporadic cases in 1996 and 1999, respectively), 53% (1996) and 23% (1999) were attributed to a source. After attribution fractions derived from cases enrolled in the case-control studies were applied to sporadic cases not enrolled, and this information was added to the case-control study results and outbreak investigations, attribution to exposure sources increased to 65% of all cases ascertained in 1996 and 34% of all cases ascertained in 1999.
a Estimated by exposures ‘visited farm with cows’ (1996), ‘lived on farm or visited farm’ (1996), and ‘living on, working, or visiting a cattle farm’ (1999).
b Estimated by exposures ‘eating pink hamburger at home’ and ‘eating pink hamburger away from home’.
c Estimated by exposure ‘table-service restaurant’.
d Estimated by exposure ‘child less than 2 years of age in household’.
e Estimated by exposure ‘pink hamburger’.
f Estimated by exposure ‘drinking untreated surface water’.
g It was assumed that exposure distribution in unenrolled cases was the same as enrolled cases.
h Due to apple cider (10 cases) and lettuce (8 cases).
i Due to swimming in a lake.
j Due to transmission in a childcare centre.
k Due to romaine lettuce.
l Due to consumption of well water contaminated by cattle faeces.
Thirty-four per cent of sporadic cases were attributed to foodborne exposures in 1996 and 10% in 1999. Foodborne exposures accounted for 26% of outbreak cases in 1996 and 18% in 1999. Foodborne exposures in 1996 included exposure at restaurants or other food service settings to an undetermined source, presumably food (20% of sporadic and 7% of outbreak cases), hamburger (14% of sporadic and 2% of outbreak cases), and produce (17% of outbreak cases, specifically apple cider and lettuce). Food exposures in 1999 were hamburger (10% of sporadic and 14% of outbreak cases) and produce (4% of outbreak cases, all linked to romaine lettuce).
In 1996, attributed non-foodborne exposure sources were contact with animals (13% of sporadic cases), exposure to untreated water (4% of outbreak cases), and person-to-person transmission (6% of sporadic cases and 70% of outbreak cases). In 1999, attributed non-foodborne exposures were contact with animals (8% of sporadic cases), exposure to untreated water (5% of sporadic and 73% of outbreak cases), and person-to-person transmission (9% of outbreak cases).
This study shows that applying the information obtained from case-control studies to other sporadic infections, and adding these to data obtained from outbreak investigations, can increase the number of infections that are attributed to a source. Source attribution using data from outbreak surveillance provides the most definitive links to sources because an outbreak is, by definition, characterized by cases that share a causal pathway. However, most infections are sporadic; case-control studies are used to identify differences between the exposure profiles of infected persons and a control population to determine the attributable fractions associated with exposures found to be statistically significant in a multivariable model. Sources of sporadic infection include a wide variety of transmission pathways, each of which may have particular factors (e.g. food storage or preparation practices) contributing to individual illnesses that cannot be quantitatively captured by the study [Reference Mølbak and Neimann19]. Consequently, the number of infections attributed to specific sources by case-control studies is typically small. The attribution of a higher proportion of ascertained cases in the population when data from both sporadic and outbreak-associated infections are used shows the value of combining two types of information to estimate sources of illness in a population. However, interpretation of the differences in the proportions of STEC O157 infections attributable to specific exposures when using blended estimates compared to those derived from only one data source, as well as the differences in attribution between the two study years, is complicated by two important sources of uncertainty: (a) the validity of the assumptions associated with the method used to attribute infections to exposure sources, and (b) the inherent variability in source attribution over time.
The assumption that there are no underlying differences in the distribution of causal transmission pathways of infection in the three subpopulations of laboratory-confirmed STEC O157 cases is difficult to evaluate. This is because exposure sources are determined indirectly for sporadic infections but directly for outbreak-associated infections. Use of active surveillance to increase the sampling fraction of source cases (laboratory-confirmed STEC O157 infections) helps to reduce this uncertainty by increasing the probability that the study population reflects the source population [Reference Greenland and Robins11]. However, it is known that laboratory-confirmed infections represent a subset of the total number of illnesses in the population [Reference Scallan1], and the degree to which confirmed infections represent the distribution of exposures and illness in the target population is unknown. In addition, sporadic cases who reported an ill household member within 28 days before submission of their stool sample were excluded from the case-control study. It is likely that a higher proportion of these excluded cases were actually related to person-to-person transmission compared to those enrolled in the study, but we assumed that both populations of sporadic cases experienced the same distribution of risky exposures. Furthermore, during at least one outbreak investigation included in this analysis, investigators conducted additional laboratory testing (Minnesota, 1996), resulting in additional laboratory-confirmed infections being identified in persons who had not sought healthcare. As a result, cases with outbreak-related illness were over-sampled relative to those associated with sporadic infection. These observed underlying differences in the ascertainment of infections from sporadic and outbreak-associated subpopulations illustrate key differences in the surveillance populations that probably influence the observed distributions of exposures and resulting attribution estimates, and may limit the generalizability of results to the target population.
Another assumption used to attribute infections to sources is that cases were correctly classified as sporadic or outbreak-associated, such that PAFs obtained from the case-control studies represent valid estimates of attribution. For example, in the 1996 study, the independent exposures of eating at a table service restaurant and eating ‘pink’ hamburger away from home were associated with PAFs of 20% and 7%, respectively. However, neither of these exposures was significantly associated with infection in the 1999 study. Under the assumption that all infections were correctly classified as sporadic or outbreak-associated during both study years, these differences in PAF reflect changes in the attributable fractions associated with these exposures between the two study years. However, if some of the sporadic cases reporting these two exposures in 1996 were associated with an undetected outbreak caused by contaminated ground beef (or exposure to ingredients cross-contaminated by ground beef during food preparation) distributed to restaurants, the exposure frequencies of these two variables in study cases may have been increased by this outbreak. In this scenario, detection of the outbreak would have resulted in the exclusion of these cases from the study, and these two exposures may not have been retained in the 1996 multivariable model, resulting in more similar final models and cumulative attribution estimates across the two case-control studies.
The exclusion of three cases from this analysis highlights the difficulty associated with assigning all cases as either sporadic or outbreak-associated. Although the reporting state designated these three cases as outbreak-associated, no common exposure was reported. We were unable to determine whether a common source was identified but not reported, or whether the cases had a common venue, e.g. eating at the same restaurant, but the specific exposure was not known. In the past decade, the Molecular Subtyping Network for Foodborne Disease Surveillance (PulseNet) has improved our ability to identify small numbers of possibly related cases, e.g. with the same rare subtype pattern, clustered in time and space. It is likely that many of these clusters represent common source outbreaks, but successful identification of a single exposure source in small numbers of clustered infections is difficult and relatively uncommon [Reference Rounds20]. Therefore, these clustered cases are classified as sporadic and potentially influence PAFs estimated by case-control studies.
The validity of the assumption that exposure risks identified in the study population reflect the effect of exposures in the target population is highly uncertain in case-control studies. Ascertained exposures in case-control studies are assumed to be descriptive of the relevant categories of infection risk [Reference Levine15, Reference Greenland16]. The risk of exposure to the pathogen in contaminated ground beef was estimated in both case-control studies by the variable ‘eating pink hamburger’. Although participants in the case-control studies were asked about exposure to hamburgers, this variable was not associated with a significantly elevated risk of infection. While consumption of pink hamburger that is not contaminated with STEC O157 will not cause infection, contaminated undercooked ground beef may not appear pink to the consumer . Consequently, the PAF estimated by case-control studies probably underestimates the proportion of cases associated with hamburger consumption [Reference Levine15].
Other factors limiting the number of infections that were attributed to sources in this study were the fairly low estimated risks in the case-control studies associated with common exposures as well as the low prevalence of exposure in cases for higher risk exposures. For example, only 38% of cases were exposed to sources associated with estimated odds ratios exceeding 2·0 in the 1996 study [Reference Kassenborg7]; and in 1999 this proportion was only 18% [Reference Voetsch8]. These low cumulative attribution proportions suggest that not all relevant exposure sources were found to be statistically significant in the multivariable model or that the questionnaires did not adequately capture the relevant categories of risk, or both. In addition, not all cases attributed to exposures in the 1996 case-control study were attributed in the blended estimates because these exposure variables, e.g. use of immunosuppressive medication, reflect susceptibility to infection rather than a source. Susceptibility to infection (as estimated by exposure to immunosuppressive medication) is known to modify the risk of foodborne infection following exposure to contaminated sources; thus, there is evidence of biological interaction between the variables included in the final multivariable model. Consequently, we have evidence that the assumption of no interaction between exposures (i.e. contaminated sources of infection) and adjustment factors (e.g. susceptibility) is not valid for the 1996 study, resulting in additional uncertainty regarding the estimated attributable numbers for sporadic infections [Reference Benichou9].
Our finding of marked variability in exposure sources between the two years may reflect some or all of the following: true differences, variations in study design, and variations in data resulting from small sample sizes. The impact of small sample sizes is highlighted by the considerable year-to-year variability in the source attribution estimates resulting from large individual outbreaks. For example, in 1996, outbreaks resulting from person-to-person transmission increased the proportion of infections attributable to this route nearly fourfold over the proportion observed in the case-control study of sporadic infections. This was also observed in 1999 when a large outbreak associated with contaminated well water tripled the proportion of infections attributed to untreated water. Although only one outbreak associated with hamburgers was reported during each year of this study, the variability in outbreak size between the two study years resulted in an attribution proportion of 2% of outbreak-associated infections resulting from contaminated hamburgers in 1996 compared to 11% in 1999. In contrast, the proportion of sporadic infections attributable to hamburgers (11%) did not change between 1996 and 1999, and these findings are consistent with the results of ground beef testing by the Food Safety Inspection Service that showed no significant change in the contamination rate of ground beef during this time [Reference Naugle22]. It is likely that the contaminated source of exposure influences the number of resulting infections. For example, a contaminated recreational water body is likely to cause more infections than a single undercooked hamburger; using the number of illnesses associated with each outbreak incorporates a source-specific weight in blended source attribution estimates. The stability of this weight can be improved by increasing the sample size and reducing the impact of year-to-year variability in outbreak size on the estimate. If it can be assumed that the sources of infection are similar over a defined time period, blended source attribution estimates may be improved by combining the results of surveillance studies across multiple years. However, these study designs would have to consider whether regulatory changes or consumer habits changed during the study.
Estimates of source attribution can aid in determining appropriate public health interventions and measuring progress toward food safety goals. Data from outbreak surveillance provides the best opportunity to directly attribute illnesses to sources, but most infections are classified as sporadic, thus limiting the generalizability of attribution estimates derived from outbreak data. Including data from both sporadic and outbreak-associated infections can increase the total proportion of infections attributed to an exposure source and improve our ability to generalize estimates to the target population. However, even after combining attribution estimates, a relatively small number of cases were attributed to sources, in part due to methodological challenges. Nonetheless, this study identified several opportunities to improve source attribution using methods that blend information from multiple surveillance systems. For example, models can be developed to estimate differences in ascertainment probabilities in the different surveillance subpopulations so that source attribution estimates may be adjusted before blending, removing the need for the assumption that the exposure distributions are equal in all source subpopulations. Likewise, new approaches may be applied to case-control data that are more suitable for analysing possibly correlated exposures in sporadic infections; and information from outbreak investigations may be used to estimate causation probabilities associated with exposure categories used in case-control studies. These approaches may help to reduce the proportion of unattributed infections, provide estimates of uncertainty for relevant time periods of interest, and contribute to improved source attribution estimation.
The authors thank George Maldonado for early discussions related to this work.
DECLARATION OF INTEREST