Skip to main content Accessibility help


  • Access


      • Send article to Kindle

        To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

        Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

        Find out more about the Kindle Personal Document Service.

        Seeking out SARI: an automated search of electronic health records
        Available formats

        Send article to Dropbox

        To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

        Seeking out SARI: an automated search of electronic health records
        Available formats

        Send article to Google Drive

        To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

        Seeking out SARI: an automated search of electronic health records
        Available formats
Export citation


The definition of severe acute respiratory infection (SARI) – a respiratory illness with fever and cough, occurring within the past 10 days and requiring hospital admission – has not been evaluated for critically ill patients. Using integrated electronic health records data, we developed an automated search algorithm to identify SARI cases in a large cohort of critical care patients and evaluate patient outcomes. We conducted a retrospective cohort study of all admissions to a medical intensive care unit from August 2009 through March 2016. Subsets were randomly selected for deriving and validating a search algorithm, which was compared with temporal trends in laboratory-confirmed influenza to ensure that SARI was correlated with influenza. The algorithm was applied to the cohort to identify clinical differences for patients with and without SARI. For identifying SARI, the algorithm (sensitivity, 86.9%; specificity, 95.6%) outperformed billing-based searching (sensitivity, 73.8%; specificity, 78.8%). Automated searching correlated with peaks in laboratory-confirmed influenza. Adjusted for severity of illness, SARI was associated with more hospital, intensive care unit and ventilator days but not with death or dismissal to home. The search algorithm accurately identified SARI for epidemiologic study and surveillance.


  • SARI is a public health concept being adopted worldwide.

  • We made a search algorithm that correctly identified large SARI patient cohorts.

  • The algorithm is useful for epidemiologic research in intensive care settings.

  • Outcomes between SARI and non-SARI patients were different.

  • Further differentiation among critically ill patients may be feasible.


Severe acute respiratory infections (SARIs) are the third leading cause of death worldwide [1]. Epidemics of SARIs, including the Middle East respiratory syndrome, severe acute respiratory syndrome and pandemic influenza, have shown the damage that SARIs can inflict. Unfortunately, SARI therapeutics and surveillance infrastructure are relatively underfunded and SARIs continue to be a worldwide threat [2].

A first step in improving the care of patients with SARIs is recognition. Knowing that a novel influenzalike illness (ILI) or pneumonia is present in the community can influence practice. Practitioners may use isolation more consistently and order targeted testing and interventions. Administrators can change staffing and stocking models to prepare intensive care units (ICUs) and emergency departments. Government and public health officials can allocate resources and personnel and update media. However, current public health recognition systems are inadequate in scope and are often too slow to be clinically useful.

The World Health Organization (WHO) developed a surveillance definition for SARI to improve comparability and consistency in SARI reporting. This definition starts with the case definition of ILI as an acute respiratory infection in a person with a fever (⩾38 °C), cough and onset within the past 10 days. A SARI is defined as an ILI requiring hospitalisation [3]. This definition was chosen because of its feasibility in worldwide implementation. However, the correlation between SARI and patient outcomes has not been evaluated.

As a first step toward improving the recognition of SARI, we sought to identify a computable phenotype for SARI in patients admitted to an ICU and to evaluate the association between SARI and patient outcomes. Such a computable phenotype could be used to develop practical real-time detection tools for SARI.


We conducted a retrospective cohort study to derive and validate a computable phenotype for SARI. We obtained Mayo Clinic Institutional Review Board approval for this minimal-risk study.

Data sources

Data sources included manual review of medical records for validation and development of a gold standard, the Mayo Clinic ICU Data Mart [4] and the Mayo Clinic Advanced Cohort Explorer. The Mayo Clinic ICU Data Mart is a repository of all physiologic, laborator and clinical data for ICU admissions at Mayo Clinic. The Advanced Cohort Explorer is a tool that allows for searching records, diagnoses and billing data, laboratory results and imaging and flow sheet data for all patients.


Patients were eligible if they had been admitted to the medical ICU at our institution from August 2009 through March 2016. Only index admissions were included. This period was chosen to encompass several flu seasons, including one pandemic season. Flu was chosen as the disease for our case study for derivation and validation because it is a SARI with reliable annual activity and because confirmatory testing is readily available. Patients were excluded if they were younger than 18 years, did not have research authorisation on file, or were admitted to a nonparticipating ICU.

Approach to the computable phenotype

Patients admitted in the peak influenza years of 2009 and 2010 (September–December in both years) were chosen as derivation and validation cohorts. Computable phenotypes were initially derived from the 2009 data and were then applied to the 2010 data. These 2 years were chosen to ensure that the algorithm could perform for pandemic and nonpandemic years.

We designed a series of text-based note searches and algorithms examining laboratory and vital parameters to identify SARI. Simultaneously, independent investigators manually reviewed the charts of patients to classify them as patients with SARI and to identify which components of the definition were met. The electronic and manual searches were reconciled and disagreements were resolved by a reviewer for whom the initial assessments were masked. The adjudicated outcomes became our gold standard for evaluating the computable phenotypes. Our a priori target for performance was sensitivity and specificity of at least 90%.

We further compared our computable phenotype to billing data. We used a search of discharge codes (International Classification of Diseases, Ninth Revision [ICD-9]) for pneumonia and influenza to identify probable SARIs (Supplemental Box). We sought to have our text-based search outperform billing-based approaches for identifying ILI.

To examine construct validity, we also plotted the percentage of admissions with SARI over time against the percentage of positive influenza swabs in a given season. Although SARI is not limited to influenza, a peak of SARI activity near the peak of flu activity would support the performance of the computable phenotype. To examine for the validity of this construct across different patterns of disease, two time periods were used: the consecutive 2014 and 2015 nonpandemic influenza seasons and the pandemic 2009 season.

Approach to evaluating SARI

Validated Data Mart tables reporting on ICU use (ventilator days and length of stay) and outcomes (mortality and discharge to home) were compared between SARI patients and non-SARI patients in the larger cohort of all patients admitted from August 2009 through March 2016 with the computable phenotype evaluated above. Differences were analysed with the Student t test for continuous data and the χ 2 test for categorical data. Adjusted models for Acute Physiology and Chronic Health Evaluation (APACHE) were based on standard least squares and nominal logistic regression models. Model comparison was performed with the model comparison platform with the hypothesis test that matched model values for area under the curve (AUC) were equivalent. P values <0.05 were considered significant. We performed all statistical analyses with JMP 12 software (SAS Institute Inc).


Computable phenotype derivation

During the peak flu activity of the 2009 season, 618 patients meeting the inclusion criteria were admitted to our medical ICU. Of these, 87 patients (14.1%) met the definition for SARI.

We first designed a search for each of the components of the SARI definition. Fever was classified as objective or subjective. Objective fever was defined as a recorded fever (⩾38 °C) in the first 24 h after admission. Subjective fever was defined as a fever at home, during transit, or under other circumstance where the temperature was not directly measured and reported. In the derivation cohort, objective fever achieved 100% sensitivity and specificity with a search of the vital signs flow chart.

Our ultimate search strategy for subjective fever was to search the admission notes for chief complaint, diagnosis and history of present illness and the impression/report/plan portion of the notes with the terms fever, febrile, elevated temperature, or fevers and then to exclude with negation terms such as no, denies, uncertain, not, or if spike(s) that appeared in the same sentence as the original search terms. This strategy had 94.7% sensitivity and 94.9% specificity.

The search for cough was more challenging because of variable reporting and negation terms in different sections of the notes. After several iterations, we could not improve on the simple search for the term cough in the admission chief complaint, diagnoses, history of present illness, or the impression/report/plan portion by excluding with the negation terms denies, not and no. The final sensitivity was 89.6% and the specificity was 97.4%.

After several iterations of adding terms for acuity to the search, we found that this only decreased sensitivity. Ultimately, it appeared that by restricting our search to ICU patients with the above terms in portions of the note reporting on acute illness, it was not necessary to specify the 10-day acuity period. Likewise, since our inclusion criteria required hospital admission, we did not need to specify any additional terms describing admission.

Aggregating our search resulted in an overall sensitivity of 83.9% and specificity of 94.2% in the derivation cohort. Further subset analysis suggested that some of the factors complicating the computable phenotype came from nonspecific terms found in notes for patients with chronic obstructive pulmonary disease (COPD). When those 148 patients were excluded, the sensitivity improved to 86.9% and the specificity improved to 95.6%. The billing search had a sensitivity of 73.8% and a specificity of 78.8%. Given the high specificity of our search and our inability to improve it further without excluding important populations, we applied the computable phenotype to the validation cohort without excluding COPD patients.

Computable phenotype validation

In the validation period, 198 patients were admitted to the medical ICU. Of these, 14 were adjudicated as SARI (7.1%). Overall, the computable phenotype performed well, with 88.9% sensitivity and 96.7% specificity, outperforming manual searches, which were 57.1% sensitive and 95.9% specific. This was also better than the ICD-9 searches, which were 78.8% sensitive and 71.1% specific. Individual manual and computable phenotype item performance is summarised in Table 1.

Table 1. Sensitivity and specificity of queriesa

SARI, severe acute respiratory infection.

a Values are presented as mean (95% CI).

Overall, we had not achieved our initial goal of sensitivity and specificity greater than 90%. However, even though we could not improve the values, the specificity was high and the computable phenotype outperformed both manual and ICD-9–based searches. Therefore, we continued to assess SARI outcomes despite not meeting our prespecified goal.

SARI and patient outcomes

From August 2009 through March 2016, a total of 13 689 patients had an index admission in the ICU. Of these, 1269 were classified as having SARI. Characteristics of these patients are summarised in Table 2.

Table 2. Characteristics of patients admitted to ICU

APACHE, Acute Physiology and Chronic Health Evaluation; ICU, intensive care unit; LOS, length of stay; SARI, severe acute respiratory infection; SOFA, Sequential Organ Failure Assessment.

a Detailed discharge status was available for 7766 patients. In a comparison of patients discharged to home and all others, 4533 did not have SARI and 3233 did.

Overall, SARI patients tended to be older and sicker and require longer ICU and hospital stays. However, despite this, rates of survival to discharge and discharge to home were comparable between SARI and non-SARI patients. When adjusted for APACHE score, SARI was still not associated with differences in the rate of survival to discharge or discharge to home (Table 3).

Table 3. SARI patient outcomes

APACHE, Acute Physiology and Chronic Health Evaluation; ICU, intensive care unit; LOS, length of stay; SARI, severe acute respiratory infection.

a Analysis was restricted to patients whose treatment included the use of a ventilator.


Because SARI appeared to be more closely linked to ventilator use than to other outcomes, we conducted an exploratory analysis of ‘SARI + ’ patients—that is, patients with SARI who received advanced ventilatory support with invasive or noninvasive mechanical ventilation or with a high-flow nasal cannula. This is summarised in Table 4.

Table 4. Diagnostic performance of SARI in predicting poor outcome

APACHE, Acute Physiology and Chronic Health Evaluation; AUC, area under the curve; ICU, intensive care unit; SARI, severe acute respiratory infection.

a SARI + patients received advanced ventilatory support with intensive or noninvasive mechanical ventilation or with a high-flow nasal cannula.

This approach did not improve the AUC for SARI with relation to patient outcomes. We additionally explored a partitioning approach with SARI variables (objective or subjective fever and cough) and oxygen support in the first 24 h. In this model, including minimum temperatures <35 °C, results did improve the AUC for mortality prediction. However, this did not improve with SARI patients overall, because all but 20 patients who were relatively hypothermic also reported having subjective fever symptoms. Overall, more complicated formulations for SARI did not lead to significant improvements in SARI linked to outcomes.

SARI over time

When the pandemic season of 2009 was examined, the peak of SARI activity coincided with the peak of influenza activity (Fig. 1). This was consistent with the construct. SARI activity, however, seemed to have a second increase as the severity of the flu season decreased.

Fig. 1. Reported Flu Activity Based on Local Cultures Compared with Local Severe Acute Respiratory Infection (SARI) Activity Across All Age Groups. Blue diamonds indicate the percentage of specimens positive for flu (dashed line, 2-week moving average). Red squares indicate percentage of patients with SARI at admission (solid line, 2-week moving average).

When the 2014 and 2015 seasons were examined, similar trends were present (Tables S1 and S2 and Supplementary Figure). Overall, SARI peaks were associated with peaks in flu activity. This was most notable in 2014 and 2015, although considerable noise was observed. Predictably, the summer activity of SARI reached its nadir around the same time as influenza. However, there appeared to be a larger second peak of SARI activity after the flu season each year. This may have indicated pneumonia or other respiratory infections, consistent with the SARI model.


We aimed to identify a computable phenotype that could identify SARI activity. According to the performance of the computable phenotype, it can identify SARI with better accuracy than procedures that use either billing data or manual chart review. Moreover, the SARI activity identified by the computer phenotype shows a predictable correlation to flu activity, consistent with our construct. A valid computable phenotype for SARI opens the door to big-data research into this important area. The strategy for identifying SARI can be adapted to develop and explore large cohorts for further research.

Components of the SARI definition are available in medical charts within the first days after admission. The phenotype could thus be detected early and used for epidemiologic monitoring and study. More importantly, it opens the door to big-data tools being used to improve the quality of care for SARI patients. Our group has applied similar computable phenotypes to function as early detectors of sepsis and acute respiratory distress syndrome and it has developed intervention alerts to improve care delivery [5, 6]. SARI care models need further development and big data may help in developing this lagging field.

One area that may be further refined is the end points and definitions for SARI. We did find that, in aggregate, SARI is associated with more days on the ventilator, in the ICU and in the hospital. Therefore, SARI is a valid patient-oriented concept tied to meaningful end points. However, we also found that many cases of clear clinical SARI (eg, H1N1 influenza virus infection in patients using a ventilator) were missed by the definition. A big-data approach is only as good as its inputs and thus its definitions and further consideration should be given as to whether the standing SARI definition is adequate for critical illness research and quality improvement.

Despite the increase in ventilator use and length of stay, the ultimate end point of death was no different between SARI patients and non-SARI critically ill patients. This is a novel finding because most SARI mortality research to date has examined subsets of the SARI population, such as children and adults and human immunodeficiency virus–positive or negative patients [79]. Two studies reported high rates of mortality among SARI patients but did not compare them to a larger cohort [10, 11]. Another study by Sakr et al. [11] did find a correlation with SARI and mortality; however, those authors also identified several other independent risk factors that could aid risk stratification in an ICU.

Part of our observed lack of mortality correlation may include our selection bias for ICU patients. The patients with SARIs received comprehensive treatment and support in critical care settings and outcomes may be different if this approach were applied to a larger cohort of hospitalised patients. A more appropriate comparator among critically ill patients may be an age- and comorbidity-matched cohort of non-SARI patients. Nonetheless, if SARI activity does not correlate with mortality end points in the ICU, it may be worthwhile to consider a different definition for ICU outcomes which further differentiates the most severely ill patients. For example, a patient with SARI who requires ventilator care, vasopressors, or other organ support in the first 24 h after admission would likely be part of another tier of outcomes among the critically ill. For SARI to be the most useful for critical care research, the criteria probably need to be developed further beyond cough, fever, acuity, respiratory illness and hospital admission. A potential template for this may come from the European Adaptive Randomised Controlled Trial to Improve Survival in Hospitalised Patients With Severe Acute Respiratory Infection. This ongoing study enrolls adult patients who are highly suspected of having community-acquired pneumonia according to at least two clinical criteria and radiologic confirmation and requiring invasive mechanical ventilation in the first 48 h [12]. This definition is more restrictive than the WHO surveillance definition of SARI, but it may be worth additional evaluation as a potential definition for identifying the most vulnerable, at-risk SARI patients and designing improved care process models. In our present study, we did not directly compare this definition with the WHO definition of SARI because of known issues with electronic surveillance and interrater reproducibility of radiographic findings well described in the ventilator-associated–event literature [13]. This may be a topic for future studies.

A limitation of our temporal correlation is a restriction of this analysis to patients with laboratory-confirmed influenza. Other SARIs most certainly contributed to the activity seen and may be responsible for much of the activity. However, as a disease with a well-documented and well-followed seasonality, it provides a good natural experiment and has been used in other studies to evaluate the utility of the SARI definition [14].

Another limitation of this study is the patient population. Medical ICU patients were selected because they are an enriched population for SARI with diverse causes and presentations; however, computable phenotype elements may not apply to general medical ward patients or other types of ICU patients. Further validation will be needed to apply this search to other groups.

Overall, the SARI computable phenotype was successful in accurately and efficiently identifying large numbers of patients with SARIs. SARI is associated with some clinically meaningful end points, although further differentiation of degrees of illness may improve its utility in identifying patients at risk for poor outcomes. This SARI computable phenotype will allow for initial evaluation and monitoring of the epidemiology of SARI in large, electronic patient cohorts.


This project is, in part, supported by Grant Number UL1 TR000135 from the National Center for Advancing Translational Sciences (NCATS). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. This publication was made possible by funding from the Mayo Clinic Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery. The sponsors had no role in the study design; in the collection, analysis, and interpretation of data; in writing the report; and in the decision to submit the article for publication.

Declaration of Interest


Supplementary Material

The supplementary material for this article can be found at


1.Mayor, S (2010) Acute respiratory infections are world's third leading cause of death. British Medical Journal 341, c6360.
2.Schluger, N (2010) The Acute Respiratory Infections Atlas. New York, NY: World Lung Foundation.
3.World Health Organization (2016) WHO surveillance case definitions for ILI and SARI: case definitions for influenza surveillance as of January 2014 [Internet]. 2016 [cited 2016 August 16]. Available at
4.Herasevich, V, et al. (2010) Informatics infrastructure for syndrome surveillance, decision support, reporting, and modeling of critical illness. Mayo Clinic Proceedings 85, 247254.
5.Harrison, AM, et al. (2015) Developing the surveillance algorithm for detection of failure to recognize and treat severe sepsis. Mayo Clinic Proceedings 90(2), 166175.
6.Herasevich, V, et al. (2009) Validation of an electronic surveillance system for acute lung injury. Intensive Care Medicine 35(6), 10181023.
7.Cohen, C, et al. (2015) Epidemiology of severe acute respiratory illness (SARI) among adults and children aged ⩾5 years in a high HIV-prevalence setting, 2009-2012. PLoS ONE 10(2), e0117716.
8.Gessner, BD (2015) Severe acute respiratory illness in Sub-Saharan Africa. Journal of Infectious Disease 212(6), 843844.
9.El Kholy, AA, et al. (2014) Risk factors of prolonged hospital stay in children with viral severe acute respiratory infections. Journal of Infection in Developing Countries 8(10), 12851293.
10.Remolina, YA, et al. (2015) Viral infection in adults with severe acute respiratory infection in Colombia. PLoS ONE 10(11), e0143152.
11.Sakr, Y, et al. IC-GLOSSARI Investigators; ESICM Trials Group (2016) The Intensive Care Global Study on Severe Acute Respiratory Infection (IC-GLOSSARI): a multicenter, multinational, 14-day inception cohort study. Intensive Care Medicine 42(5), 817828. Epub 2016 Feb 15. Erratum in: Intensive Care Medicine 2016 May;42(5):953.
12.Platform for European Preparedness Against (Re)-emerging Epidemics (PREPARE) (2016) Practice C: adaptive randomised trial SARI: Workpackage 5 [Internet]. [cited 2016 August 22]. Available at
13.Stevens, JP, et al. (2014) Automated surveillance for ventilator-associated events. Chest 146(6), 16121618.
14.Makokha, C, et al. (2016) Comparison of Severe Acute Respiratory Illness (SARI) and clinical pneumonia case definitions for the detection of influenza virus infections among hospitalized patients, western Kenya, 2009–2013. Influenza and Other Respiratory Viruses 10(4), 333339.