Skip to main content Accessibility help
×
Home

Information:

  • Access
  • Cited by 7

Actions:

      • Send article to Kindle

        To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

        Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

        Find out more about the Kindle Personal Document Service.

        Syndromic surveillance: sensitivity and positive predictive value of the case definitions
        Available formats
        ×

        Send article to Dropbox

        To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

        Syndromic surveillance: sensitivity and positive predictive value of the case definitions
        Available formats
        ×

        Send article to Google Drive

        To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

        Syndromic surveillance: sensitivity and positive predictive value of the case definitions
        Available formats
        ×
Export citation

Summary

The aim of the study was to measure the positive predictive value (PPV) and sensitivity of operational case definitions of 13 syndromes in a surveillance system based on the Emergency online database of the Lazio region. The PPVs were calculated using electronic emergency department (ED) medical records and subsequent hospitalizations to ascertain the cases. Sensitivity was calculated using a modified capture–recapture method. The number of cases that fulfilled the case definition criteria in the 2004 database ranged from 27 320 for gastroenteritis to three for haemorrhagic diarrhoea. The PPVs ranged from 99·3 to 20; sepsis, meningitis-like and coma were below 50%. The estimated sensitivity ranged from 90% for coma to 22% for haemorrhagic diarrhoea. Syndromes such as gastroenteritis, where the signs, symptoms, and exposure history provide immediate diagnostic implications fit this surveillance system better than others such as haemorrhagic diarrhoea, where symptoms are not evident and a more precise diagnosis is needed.

INTRODUCTION

Syndromic surveillance systems arose from the need to immediately identify unexpected clusters of disease; the main impetus behind building this kind of surveillance has been the threat of bioterrorism [15]. The SARS epidemic in 2004 and the pressing threat of an influenza pandemic are now providing more relevant uses for syndromic surveillance in a wider public health spectrum [2, 69].

The rationale behind monitoring syndromes instead of diseases is to identify the occurrence of clusters as quickly as possible [3, 10].

To be useful and efficient a syndromic surveillance system must be sensitive, i.e. it should recognize real clusters, specifically, it should have a high positive predictive value (PPV), in other words it should make very few false positives, and timely, the cluster should be identified early enough for an effective response [11, 12].

One of the factors influencing both specificity and sensitivity of a surveillance system is the operative case definition adopted [1315]. Several syndromes have been under surveillance in order to monitor different hypothetical disease clusters [1619]. In the Lazio region there is an Emergency Information System (EIS) [20] that collects all the daily admissions from 34 (out of 61) of the emergency departments (ED) in the region.

In this paper we developed operational case definitions for the 13 syndromes included in the syndromic surveillance [19] of an Italian region, which can automatically be detected from the electronic emergency-room visit report, and we estimated their sensitivity and their PPV.

MATERIALS AND METHODS

The setting

The setting was the Lazio region, with about 5·5 million inhabitants, which is the region of central Italy that includes Rome (3 million inhabitants).

Data source

The syndromic surveillance system of the Lazio region is based on the EIS [20]. Since 2000, it has recorded all emergency ward admissions in Lazio from all 61 EDs in the region.

For each ED admission the EIS reports:

  • personal data (the name, the date and place of birth, the gender of the patient);

  • information collected at triage:

    • the triage code (an operative scale of urgency used to establish treatment priority);

    • the chief complaint [grouped into 15 categories (coma, fever, convulsions, other nervous system symptoms, dyspnoea, trauma, chest pain, precordial pain, vomiting, abdominal pain, intoxication, haemorrhage without trauma, other symptoms, other pain, fixed appointment)];

    • symptom onset (in hours before the visit);

    • some vital parameters (body temperature, blood pressure, respiratory rate, cardiac frequency, Glasgow Coma Scale);

  • up to five diagnoses and up to five therapeutic procedures (both diagnoses and procedures coded according to ICD-9-CM);

  • the outcome of the admission (hospitalization, death, transfer or discharge).

All of the data from 34 EDs is immediately transferred from the hospital to the regional system in real time. These EDs are included in the syndromic surveillance (Fig. 1).

Fig. 1. Flow-chart of the surveillance system. The Emergency Information System collects the data gathered at triage and at the end of the emergency visit and transmits it to the region, where the data are automatically analysed in real-time for clustering by the syndromic surveillance. Then the identified clusters are manually screened by the epidemiology team to detect putative outbreaks.

About 40% of the records included free-text diagnoses, which were directly reported by the emergency physician at the end of the visit, integrating the ICD-9-CM codes.

Study design (Fig. 2)

The operational case definitions were tested on the 2004 database. To ascertain the cases identified, we used a re-abstract study based on the analysis of the electronic ED medical records and the Hospitalization Information System (of all the hospitalizations that occurred in our region [21] of the cases identified). As examples of false-positive records identified, many cases of unspecified shock (ICD-9-CM 785.50; 785.59), which were identified by the operational definition as ‘Sepsis or unexplained shock’ syndrome, were re-classified as lipothymia during the re-abstract study; furthermore, many ED visits reporting central nervous system ICD-9-CM codes were classified as ‘Meningitis, encephalitis, or unexplained acute encephalopathy’ syndromes by the operational definition, but during the re-abstract study it was clear that they were recurrences of pre-existing psychoses.

Fig. 2. Study design. All the emergency department visits reported to the system are divided into three subsets: true syndromes (solid line circle), the cases fulfilling the operational definition (dashed line circle), and the cases fulfilling the free text-based definition (dotted line circle). The formulas used to calculate sensitivity, related to the figure by letters, are at the top left. At the bottom left we reported an example with the numbers for respiratory symptoms with fever.

To measure the PPV, we randomly sampled, for each of the 13 identified syndromes and among cases with electronic medical records available, 300 cases that matched the operational definitions. If there were fewer than 300 cases that fulfilled the definition, we checked the entire population. We measured the percentage of true positives and false positives in the sample.

To measure the sensitivity we needed to know how many cases were not captured by our operational case definition. The total number of records that did not fit any of our case definitions was very high and we expected a very low prevalence of false negatives, so the estimate would have been very imprecise unless we studied a very large sample. To minimize this problem we created a second case definition designed to be as sensitive as possible, based on a different source of information than that used for the operational case definition, i.e. the free-text diagnosis of the electronic medical record, we call this second definition the ‘free text-based definition’.

We applied this free text-based definition on the same dataset. We calculated the sensitivity of the free text-based definition on the subgroup of the true positives that we had already captured with the operational definition.

We quantified the PPV of the free text-based definition on a random sample of 300 cases for each syndrome. For this sampling we excluded all the records already captured by the operational definition; consequently the estimated PPV refers only to the population not captured by the operational definition. This choice allowed us to have a higher precision in the estimation of the entire population. If there were fewer than 300 cases that fulfilled the definition, we checked the entire population.

Applying the sample estimate of the PPV of the two definitions to the entire captured population and the observed sensitivity of the free text-based definition to the subset of the operational definition, we estimated the number of true cases captured by both definitions, those captured by the operational definition only, and those by the free-text definition only. These three quantities were then used to calculate the true cases not captured by either definition through a capture–recapture method assuming the two sources were independent.

The following formula was adopted to estimate the entire population and consequently the sensitivity of the operational definition (Fig. 2):

where a represents the true cases captured by both definitions; d the true cases captured only by the free-text definition; e the true cases captured only by the operational definition; ? the true cases not captured by any definition; PPVop is the sample estimate of the PPV of the operational definition; Nop is the number of cases captured by the operational definition; PPVtext is the sample estimate of the PPV of the free-text definition among the population not captured by the operational definition; Ntext is the number of cases captured by the free-text definition among the population not captured by the operational definition; Senstext is the sample estimate of the free-text definition sensitivity.

Estimate of uncertainty due to the sampling procedures

We calculated 95% confidence intervals (95% CI) for the PPV sample estimates according to a binomial distribution. The 95% CI for sensitivity rate was calculated using Monte Carlo [22] simulations assuming binomial distributions of the sample estimates for the PPVop, PPVtext and Senstext.

Power of the study

The 300 case samples provided a precision of ±3% in the case of a 50% PPV, i.e. the case maximizing the variance and the uncertainty.

RESULTS

The EIS collects records from about 2·2 million ED visits per year, 1·5 million are from hospitals that participate in the syndromic surveillance system.

Table 1 presents the syndromes and their operational case definitions. Most of the definitions are based on the information gathered at triage (e.g. fever, absence of trauma or other chronic conditions), first contact between the patient and the emergency personnel, the diagnosis given at the end of the visit, and on outcome of the visit (i.e. for ‘unexplained death’). Only one syndrome, haemorrhagic diarrhoea, requires the presence of two diagnostic codes at the same visit.

Table 1. Syndrome definition and putative diseases or aetiological agents

Neurological syndromes include the largest number of codes from different chapters of the ICD-9-CM: infectious diseases, neurological disorders, symptoms and trauma.

The number of cases that fulfilled the operational case definition in 2004 ranged from 27 320 for gastroenteric syndrome to three for haemorrhagic diarrhoea (Table 2). The PPVs ranged from 99·3 to 20, half of the definitions have a PPV over 90%, while sepsis, neurological/meningitis and coma are below 50%.

Table 2. Number of cases captured by the case definitions and positive predictive values (PPV)

* Number of cases for which the free-text definition was available.

Two out of three cases, without free-text diagnosis were ascertained directly checking the medical records, for the last one the medical record was not available.

Since the captured cases were less than 300, we checked all the available records.

§ For 25 cases the free-text diagnosis was not informative and was not included in the PPV proportion.

To calculate the sensitivity of our operational definitions we used a second definition, based on the free-text diagnosis, which was developed to be as sensitive as possible despite its low specificity. The free text-based definitions are given in the Appendix (available in the online version of the paper), together with a complete list of the codes used in the operational definitions. Table 3 presents the sensitivity of the free text-based definitions, measured on the subset of true positives of the operational case definitions: the objective of high sensitivity was reached since all values are close to or over 90%. On the other hand, the PPVs range from 2% to 78%.

Table 3. Sensitivity of the case definitions. In the following table are reported the values used to estimate the sensitivity of each operational case definition: the sensitivity of the free-text definition, the positive predictive value (PPV) and the estimated number of missed cases

* The free-text definition has been applied only to the records that have not been captured by the operational definition.

The 95% confidence intervals have been calculated using Monte Carlo simulation and assuming binomial distribution for the sensitivity and the PPVs.

For these syndromes we did not estimate the 95% CI because of small numbers, the results are subject to extreme variability.

The sensitivity of the operational definitions, obtained through the modified capture–recapture model, ranges from 90% for coma to 22% for haemorrhagic diarrhoea.

DISCUSSION

The formation of new public health goals has freed syndromic surveillance from its original objective of being a tool to prevent bioterrorism, and given it new relevance and applicability [2, 9].

The syndromes to be monitored were selected by a collaborative panel composed of the Ministry of Health and Defence [19]. The goal of this surveillance is to detect unexpected clusters of existing diseases as well as putative bioterrorism attacks. Bearing in mind the objectives of the surveillance system, we evaluated how the case definitions worked in practice.

The syndrome with the lowest PPV was sepsis or shock. This is an obvious consequence of the low prevalence of this syndrome [23, 24]. Furthermore, the a priori knowledge of the small number of cases and the syndrome's severity led us to use a non-specific definition to maximize sensitivity. Unfortunately, despite the low PPV, the sensitivity is still not high.

The neurological syndromes, central (associated with meningitis) and peripheral (associated with botulism), performed worst (low sensitivity and low specificity). This reflects the multitude of symptoms that an acute neurological disorder may produce; these are not syndromes but systemic diseases that include several syndromes [2426].

The operational definition used for haemorrhagic diarrhoea syndrome is not well adapted to this information system because it needs the presence of two diagnoses, while more than 90% of the cases in our dataset list only the principal diagnosis. All other syndromes that require the presence of two conditions can use at least one from triage (chief complaint or vital parameters).

The low sensitivity for unexplained death was unexpected. If the sensitivity of the automatic surveillance of ED visits is less than 60%, such an important syndrome should be surveyed with several sources of information and followed by specific training for ER personnel.

Limits and methodological remarks

We calculated the sensitivity with a simple two-source capture–recapture method. This model assumes that the two capturing methods are independent (not a very reasonable assumption). We modified the capture–recapture model to take into account captured cases that were false positives, i.e. we adjusted for poor specificity, but we estimated the proportion of false positives only on a sample of the captured cases.

Another limit of the study is our gold standard: we used all the information available in the electronic ED medical records and hospital admission databases to identify cases, but the accuracy of this information is often poor [21]; on the other hand, this approach permitted us to evaluate 13 syndromes relatively quickly without site visits or having to re-abstract paper medical records.

To our knowledge, there are few studies that apply a capture–recapture model to adjust for the PPV of the data sources; van Hest and colleagues [27] accounted for the non-ascertainment of cases and for imperfect record-linkage in their estimate of tuberculosis under-notification, but their results are not comparable with ours since they were interested in laboratory-confirmed cases while our aim was to capture syndromes.

CONCLUSIONS

EDs are universally considered one of the best sources for syndromic surveillance [12, 28]. The present study confirms that an online emergency information system can be efficiently used to automatically monitor several syndromes [7, 25, 2933].

The sensitivity and PPV estimates we propose are context-specific and cannot be applied to other surveillance systems, because the operational definitions must be established with local emergency physicians and tested in the area under consideration. Nevertheless, the method we propose can be used in any automated surveillance system and some general findings about the syndromes to be monitored can be made. Some syndromes, such as gastroenteritis, where the exposure history [34] and symptoms are immediately clear, fit this surveillance better than others, such as hemorrhagic diarrhoea, where the symptoms are not evident and a more precise diagnosis, often based on a simple laboratory test, is needed.

ACKNOWLEDGEMENTS

We thank all the staff from the Emergency Information System who make the surveillance possible every day. In particular Assunta De Luca, coordinator of the Emergency network of the Lazio region and Luisa Sodano, Centre for Disease Control and prevention (CCM) of the Italian Ministry of Health. We also thank Margaret Becker for the English editing, and the reviewers of this paper for their invaluable contribution to the final version of this paper. The project has been financed by the Italian Ministry of Health Centro per il Controllo e la Prevenzione delle Malattie (CCM).

DECLARATION OF INTEREST

None.

NOTE

Supplementary material accompanies this paper on the Journal's website (http://journals.cambridge.org).

REFERENCES

1. Agency for Healthcare Research and Quality. Bioterrorism preparedness and response: use of information technologies and decision support systems. Summary, evidence report/technology assessment, Number 59, July 2002. Agency for Healthcare Research and Quality, Rockville, MD.
2. Reingold, A. If syndromic surveillance is the answer, what is the question? Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science 2003; 1: 15.
3. Buehler, JW, et al. Syndromic surveillance and bioterrorism-related epidemics. Emerging Infectious Diseases 2003; 9: 11971204.
4. CDC. Syndromic Surveillance: Reports from a National Conference, 2003. Morbidity and Mortality Weekly Report 2004; 53 (Suppl.): 1261.
5. CDC. Syndromic Surveillance: Reports from a National Conference, 2004. Morbidity and Mortality Weekly Report 2005; 54 (Suppl.): 1212.
6. Petrosillo, N, Puro, V, Ippolito, G. Border screening for SARS. Medical Journal of Australia 2004; 180: 597.
7. Foldy, SL, et al. SARS surveillance project – internet-enabled multiregion surveillance for rapidly emerging disease. Morbidity and Mortality Weekly Report 2004; 53 (Suppl): 215220.
8. CDC. Outbreak of severe acute respiratory syndrome – worldwide, 2003. Morbidity and Mortality Weekly Report 2003; 53: 226228.
9. Cooper, DL, et al. Can syndromic surveillance data detect local outbreaks of communicable disease? A model using a historical cryptosporidiosis outbreak. Epidemiology and Infection 2006; 134: 1320.
10. Dembek, ZF, Cochrane, DG, Pavlin, JA. Syndromic surveillance. Emerging Infectious Diseases 2004; 10: 13331334.
11. CDC. Framework for evaluating public health surveillance systems for early detection of outbreaks; recommendations from the CDC Working Group. Morbidity and Mortality Weekly Report 2004; 53 (No. RR-5).
12. Wagner, MM, et al. Availability and comparative value of data elements required for an effective bioterrorism detection system, 184 pp. Report commissioned by AHRQ. Delivered 28 November 2001 (http://rods.health.pitt.edu/LIBRARY/dato2AHRQInterimRpt112801.pdf).
13. Espino, JU, Wagner, MM. Accuracy of ICD-9-coded chief complaints and diagnoses for the detection of acute respiratory illness. Proceedings of AMIA Symposium, 2001, pp. 164168 (http://rods.health.pitt.edu/LIBRARY/amia2001_final_reviedEspino.pdf).
14. Ivanov, O, et al. Accuracy of three classifiers of acute gastrointestinal syndrome for syndromic surveillance. Proceedings of AMIA Symposium 2002, pp. 345349.
15. Greenko, J, et al. Clinical evaluation of the emergency medical services (EMS) ambulance dispatch-based syndromic surveillance system, New York City. Journal of Urban Health 2003; 80 (Suppl. 1): I50I56.
16. CDC. Syndrome definitions for diseases associated with critical bioterrorism-associated agents. Atlanta, CDC, 2003.
17. CDC. Recognition of illness associated with exposure to chemical agents – United States, 2003. Morbidity and Mortality Weekly Report 2003; 52: 938940.
18. Dafni, UG, et al. Algorithm for statistical detection of peaks – syndromic surveillance system for the Athens 2004 Olympic Games. Morbidity and Mortality Weekly Report 2004; 53 (Suppl.): 8694.
19. Epidemiological Consultation Team. Surveillance system in place for the 2006 Winter Olympic Games, Torino, Italy, 2006. Eurosurveillance 2006; 11(2): E060209.4 (http://www.eurosurveillance.org/ew/2006/060209.asp#4).
20. Giorgi Rossi, P, et al. Road traffic injuries in Lazio, Italy: a descriptive analysis from an emergency department-based surveillance system. Annals of Emergency Medicine 2005; 46: 152157.
21. Cardo, S, et al. The quality of medical records: a retrospective study in Lazio Region, Italy [in Italian]. Annali di Igiene 2003; 15: 433442.
22. RISKview Version 4. April, 2000. Palisade Corporation, Newfield, NY, USA.
23. Altman, DG, Bland, JM. Diagnostic tests 2: predictive values. British Medical Journal 1994; 309: 102.
24. Lombardo, JS, Burkom, H, Pavlin, J. ESSENCE II and the framework for evaluating syndromic surveillance systems. In: Syndromic Surveillance: Reports from a National Conference, 2003. Morbidity and Mortality Weekly Report 2004; 53 (Suppl): 159165.
25. Wagner, MM, et al. Syndrome and outbreak detection using chief-complaint data – experience of the Real-Time Outbreak and Disease Surveillance (RODS) project. In: Syndromic Surveillance: Reports from a National Conference, 2003. Morbidity and Mortality Weekly Report 2004; 53 (Suppl): 2831.
26. Henry, JV, Magruder, S, Snyder, M. Comparison of office visit and nurse advice hotline data for syndromic surveillance – Baltimore – Washington, D.C., Metropolitan Area, 2002. In: Syndromic Surveillance: Reports from a National Conference, 2003. Morbidity and Mortality Weekly Report 2004; 53: 112116.
27. van Hest, NA, et al. Completeness of notification of tuberculosis in The Netherlands: how reliable is record-linkage and capture-recapture analysis? Epidemiology and Infection 2007; 135: 10211029.
28. Hirshon, JM. The rationale for developing public health surveillance systems based on emergency department data. Academic Emergency Medicine 2000; 7: 14281432.
29. Irvin, CB, Nouhan, PP, Rice, K. Syndromic analysis of computerized emergency department patients' chief complaints: an opportunity of bioterrorism and influenza surveillance. Annals of Emergency Medicine 2003; 41: 447452.
30. Lewis, M, et al. Disease outbreak detection system using syndromic data in the greater Washington DC area. American Journal of Preventive Medicine 2002; 23: 180186.
31. Lober, WB, et al. Collection and integration of clinical data for surveillance. Studies in Health Technology and Informatics 2004; 107: 12111215.
32. Paladini, M. Daily emergency department surveillance system – Bergen County, New Jersey. In: Syndromic Surveillance: Reports from a National Conference, 2003. Morbidity and Mortality Weekly Report 2004; 53 (Suppl): 4749.
33. Yuan, CM, Love, S, Wilson, M. Syndromic surveillance at hospital emergency departments – southeastern Virginia. In: Syndromic Surveillance: Reports from a National Conference, 2003. Morbidity and Mortality Weekly Report 2004; 53 (Suppl): 5658.
34. Giorgi Rossi, P, Borgia, P. Trying to improve syndromic surveillance: the history of exposure. Epidemiology and Infection 2006; 134: 902903.