SHEA position statement on pandemic preparedness for policymakers: pandemic data collection, maintenance, and release

Westyn Branch-Elliman; David B. Banach; Lynne J. Batshon; Ghinwa Dumyati; Sarah Haessler; Vincent P. Hsu; Robin L.P. Jump; Anurag N. Malani; Trini A. Mathew; Rekha K. Murthy; Steven A. Pergam; Erica S. Shenoy; David J. Weber

doi:10.1017/ice.2024.65

SHEA position statement on pandemic preparedness for policymakers: pandemic data collection, maintenance, and release

Published online by Cambridge University Press: 05 June 2024

Westyn Branch-Elliman

Trini A. Mathew and

Westyn Branch-Elliman*: Affiliation:
Veterans Affairs Boston Healthcare System, Boston, MA, USA VA National Artificial Intelligence Institute (NAII), Washington, DC, USA Harvard Medical School, Boston, MA, USA
David B. Banach: Affiliation:
University of Connecticut School of Medicine, Farmington, CT, USA Yale School of Public Health, New Haven, CT, USA
Lynne J. Batshon: Affiliation:
Society for Healthcare Epidemiology of America (SHEA), Arlington, VA, USA
Ghinwa Dumyati: Affiliation:
University of Rochester Medical Center, Rochester, NY, USA Center for Community Health, Rochester, NY, USA
Sarah Haessler: Affiliation:
Baystate Medical Center, Springfield, MA, USA University of Massachusetts Chan Medical School – Baystate, Springfield, MA, USA
Vincent P. Hsu: Affiliation:
AdventHealth, Altamonte Springs, FL, USA Loma Linda University School of Medicine, Loma Linda, CA, USA
Robin L.P. Jump: Affiliation:
Geriatric Research Education and Clinical Center (GRECC), Veterans Affairs Pittsburgh Healthcare System, Pittsburgh, PA, USA University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
Anurag N. Malani: Affiliation:
Trinity Health Michigan, Ann Arbor, MI, USA
Trini A. Mathew: Affiliation:
HealthTAMCycle3, PLLC, Troy, MI, USA Corewell Health, Taylor, MI, USA School of Medicine, Wayne State University, Detroit, MI, USA Oakland University William Beaumont, Rochester, MI, USA
Rekha K. Murthy: Affiliation:
Cedars-Sinai, Los Angeles, CA, USA David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
Steven A. Pergam: Affiliation:
Fred Hutchinson Cancer Research Center, Seattle, WA, USA University of Washington, Seattle, WA, USA Seattle Cancer Care Alliance, Seattle, WA, USA
Erica S. Shenoy: Affiliation:
Harvard Medical School, Boston, MA, USA Massachusetts General Hospital, Boston, MA, USA Mass General Brigham, Boston, MA, USA
David J. Weber: Affiliation:
University of North Carolina, Chapel Hill, NC, USA
*: Corresponding author: Westyn Branch-Elliman; Email: wbranche@bidmc.harvard.edu

Article contents

Abstract
Background
Rationale
Recommendations
Summary
Financial support
Competing interests
Disclaimer
References

Rights & Permissions

Abstract

The Society for Healthcare Epidemiology in America (SHEA) strongly supports modernization of data collection processes and the creation of publicly available data repositories that include a wide variety of data elements and mechanisms for securely storing both cleaned and uncleaned data sets that can be curated as clinical and research needs arise. These elements can be used for clinical research and quality monitoring and to evaluate the impacts of different policies on different outcomes. Achieving these goals will require dedicated, sustained and long-term funding to support data science teams and the creation of central data repositories that include data sets that can be “linked” via a variety of different mechanisms and also data sets that include institutional and state and local policies and procedures. A team-based approach to data science is strongly encouraged and supported to achieve the goal of a sustainable, adaptable national shared data resource.

Type: SHEA Position Paper
Information: Infection Control & Hospital Epidemiology , Volume 45 , Issue 7 , July 2024 , pp. 821 - 825

DOI: https://doi.org/10.1017/ice.2024.65 [Opens in a new window]
Creative Commons: This is a work of the US Government and is not subject to copyright protection within the United States. Published by Cambridge University Press on behalf of The Society for Healthcare Epidemiology of America.
Copyright: © VA Healthcare System, 2024

Background

Without a centralized, national data system, the public health community persistently encountered gaps in data availability during the COVID-19 pandemic, delaying analysis and scientific advancements essential for informing and updating pandemic policy response recommendations. This commentary provides recommendations for the creation and maintenance of such a centralized data system to improve the nation’s ability to detect and iteratively respond to matters important to public health, including future pandemics, in a data-informed manner.

Rationale

Moving forward, the creation of a national data repository with a wide variety of data elements and types is essential to develop and inform future practice and policy. Efforts to modernize data management and sharing should balance speed, accuracy, reliability, transparency, and feasibility (Figure 1). They should include strategies to “future proof” data, as forecasting the necessary data elements is not always possible. Agile data systems can help answer research and policy questions. Thus, data collection systems must be forward-thinking and continuously adaptable, expandable, and linkable to meet current and future needs.

Figure 1. Key elements of a modernized public health data system to support future infectious diseases public health challenges.

Recommendations

Modernizing data management and access is a top priority for the Society for Healthcare Epidemiology in America (SHEA). A centralized, national data system should focus on secure data collection, harmonization, and curation with dedicated teams of data scientists and clinical experts. Ideally, the system should be searchable with rapid access to data that can be de-identified. The ability to retain identifiers in a secure national repository is necessary for future linkage to other data sets or other data elements. Historically, most national, centralized data systems, such as the Centers for Disease Control and Prevention’s (CDC) National Healthcare Safety Network, have focused primarily on the collection of quantitative variables. However, an advanced data sharing and management system should encompass a wide variety of data types in anticipation for technological advancements capable of extracting data from sources currently inaccessible. The data repository should include both qualitative and quantitative data elements to promote innovation and policy evaluation. Considerations of different features and principles of a modernized data management and sharing system follow. Specific recommendations to policymakers are presented in Table 1.

Table 1. Pandemic Data Management: Challenges, Recommendations to Policymakers, and Examples

Data collection and harmonization

Modernizing data collection with the creation of a shared national data resource will allow for near-real time data analysis and provide a platform for future technological advancements and improvement in the practice of early detection and infection prevention.

Data platforms must have advanced mechanisms for securing and storing data to protect individually identifiable health information, similar to protections employed in other sectors, such as the banking industry.

Data organization

Recognizing that many scientific questions are not clear in advance, national data sets with detailed cross-referencing ontology should be created. These data sets must clearly explain how different data elements relate to each other and are organized, such that different data sources can be accurately and efficiently connected.

Data will come from a variety of sources, including state and local health departments and individual healthcare facilities, and identifiers need to be available for cross-linkage. Curated data elements should be organized. A data dictionary, with clear and reproducible definitions, should be available for review.

Challenges with linking different data sets and data elements has limited advancements during the pandemic. The removal of identifiers from data elements is permanent, meaning de-identified data elements in one data set can no longer be linked data elements in another data set to allow for accurate, granular analysis. In the absence of rich data sets, data collected at the individual, facility/healthcare system, community, state, or national level is often augmented with other data sets. Aggregated data is often insufficiently precise to allow for the granular, detailed analysis needed to inform clinical care and infection prevention decisions. Careful data management planning is required to ensure that relevant identifiers that can be used to cross-reference and link different data sets and data sources are maintained with important cybersecurity measures, as exist in other sectors. Failure to accurately and precisely match various data elements can lead to challenges with data analysis and interpretation, or, under the worst circumstances, uninterpretable findings. To facilitate innovation and accurate and actionable analysis, national data sets should be created that can interface with different electronic medical records and include multiple identifiers and the most granular unit of analysis possible that also maintains anonymity and Health Insurance and Portability and Accountability Act (HIPAA)-protections.

Data availability and sharing

In line with the Open Science Framework,^{Reference Foster and Deardorff1} the modernized data resource should include mechanisms for broad access to shared national resources to democratize data availability.

To the extent feasible, data sets should be made publicly available. If person or facility/healthcare system-level identifiers are required, a simple data use agreement process, with protections in place, could be used to protect human and facility subjects for information in which identifiers are necessary or complete deidentification is not possible, similar to the process used for the CDC restricted access COVID-19 database.² This process balances access and data safety and provides a mechanism for transferring and sharing large data sets. Once a sufficient number of cases have accrued to protect patient and/or facility/healthcare system privacy, anonymized data should be made widely available with minimal access restrictions.

Balancing speed of data availability and accuracy and reproducibility

Mechanisms for improving the speed and accuracy of release of centrally collected data must be supported and advanced.

Creation of a national database with consistent and comparable data elements is challenging, as every healthcare system and every state and local government agency collects and transmits data in a different way. Thus, under current processes, data collection and availability necessitate substantial data cleaning before it can be analyzed or published for review. The time-consuming nature of data cleaning requires a balance between substance and efficiency. In some cases, data cleaning to ensure accuracy and reproducibility can lead to years long delay in data availability, as is the case for the US Renal Data System, which tracks nation-wide kidney disease and dialysis use.³ To address the inherent tension between the speed of data availability (and apparent government transparency), a centralized data collection and management service could prioritize organization and release of a prespecified set of objective, quantitative data elements with a focus on early data release. Other data elements that are more difficult to clean and classify could be collected but not prioritized for cleaning, organization, and release. If future needs for these data elements arise, then the information could be organized by data scientists with relevant expertise depending on the data type and made available. Such a system might balance future-proofing the database with speed of data availability.

Multidisciplinary data management team

To achieve these data collection and management goals, we encourage the development of multidisciplinary teams, with expertise in computer science, informatics, mixed methods, qualitative data analysis, and clinical expertise including in infectious diseases and infection prevention with dedicated support to manage and continually adapt and improve national data sources.

Multidisciplinary data management teams, with dedicated funding and support, should focus on data procurement, storage, management, and organization to create a national data resource that can be broadly used.

Forward-facing, not backward-looking, data collection

A future-proof national data repository with a variety of different data types and elements should be created and centrally managed and funded.

Accurately forecasting data elements that will be needed and useful in the future is not possible and complicates the creation of a national data repository. Observational research investigations are often plagued by missing data. To address this perennial research challenge, which limits the quality and interpretability of observational data sets, a variety of data sources should be collected and centrally stored. However, not all data elements need to be cleaned and then made publicly accessible. For example, facility-wide methicillin-resistant Staphylococcus aureus (MRSA) control policies could be collected and stored in a centralized data repository but not organized or categorically coded. If questions arise about the effectiveness of different bundles, unstructured data collected in the centralized data repository could be organized and analyzed at a future time point to resolve clinically important questions about policy real-world effectiveness. These structured and unstructured data elements should be available via a relatively streamlined data use agreement process to balance access with protection of participating parties. If access is granted to these centrally collected materials, coders should then provide their coding scheme, definitions, and output to the centralized data repository for future use and validation.

To achieve these goals, the national data repository should actively seek facility/healthcare system and state and local data and facilitate data collection, rather than continuing to pursue a passive, voluntary data collection process. Recognizing that many institutions and state and local governments have limited resources to support data sharing, national resources including federal funding and personnel support should be dedicated to assisting with these efforts. For state-level data, participation should be strongly encouraged and supported. Common and consistent definitions and data elements, including elements that can be used to evaluate health equity outcomes, should be applied and collected to allow comparisons. To encourage participation, data included in the data repository should be readily accessible to those willing to share with others.

Transparency

Lowering of public trust fueled by limited transparency is an ongoing public health crisis that must be addressed and alleviated through increasing data access and transparency of analysis.

Lack of trust has complicated public health efforts during the pandemic.^{Reference Bollyky, Hulland and Barber4,Reference Perry5} Coupled with advancements in health communications,^{Reference Steel-Fisher, Findling and Caporello6} more transparent data access may help to improve trust in governmental institutions and in scientific expertise and improve uptake of public health policy recommendations.^{Reference Kennedy, Tyson and Funk7} To promote trust, reliability, and reproducibility, code underlying database creation, linkage, and any analysis should be available for review and comment with feedback.

Technological advancements and innovations

To improve early warning systems and to facilitate future pandemic responses, data repositories must be modernized and standardized.

Modernized data repositories are an essential first step toward leveraging emerging technologies, such as machine learning and other artificial intelligence-based tools, to improve early warning systems and to facilitate pandemic responses. To support future advancement, databases should be adaptable to many common programming languages and in many common file formats.

Sustainability

To achieve innovation in infection detection and pandemic preparedness, centralized data management systems need to be sustainable for the long term, with a dedicated and nationally funded and supported data management team.

The data management team should have dedicated funding and multidisciplinary expertise and should focus on creation and maintenance of a national repository as its primary goal and purpose. Specific assignments of the study team should include data collection, quality control and review, data harmonization and curation, and steps to de-identify and release data. The data science team should also ensure that data are easily organized and available to programmers and analysts for use and review. Sustainability can also be attained by integrating and supporting an interface with healthcare system, state, and local data collection systems, to reduce the need for data cleaning, and by providing external facilitation to smaller organizations to help them make data available. The dedicated data science team should be flexible and able to pivot as needed to mitigate an evolving public health emergency.

Summary

In summary, SHEA strongly supports modernization of data collection processes and the creation of publicly available data repositories that include a wide variety of data elements and mechanisms for securely storing both cleaned and uncleaned data sets that can be curated as clinical and research needs arise. These elements can be used for clinical research and quality monitoring and to evaluate the impacts of different policies on different outcomes. Achieving these goals will require dedicated, sustained, and long-term funding to support data science teams and the creation of central data repositories that include data sets that can be “linked” via a variety of different mechanisms and also data sets that include institutional and state and local policies and procedures. A team-based approach to data science is strongly encouraged and supported to achieve the goal of a sustainable, adaptable national shared data resource.

Acknowledgments

The authors would like to thank the VA National Artificial Intelligence Institute for assistance with graphic design.

Financial support

None.

Competing interests

WBE reports research funding from the Health Services Research and Development Service (US Department of Veterans Affairs) and research support from Gilead Sciences (Funds to Institution). She also reports salary support from the VA National Artificial Intelligence Institute.

Disclaimer

The views presented are those of the authors on behalf of the Society for Healthcare Epidemiology in America and do not necessarily represent those of the US Department of Veterans Affairs or the US Federal Government.

References

Foster, ED, Deardorff, A. Open science framework (OSF). Journal of the Medical Library Association: JMLA 2017;105:203.10.5195/jmla.2017.88CrossRef Google Scholar

Centers for Disease Control and Prevention. COVID-19 Case Surveillance Restricted Access Detailed Data. Atlanta, GA: Centers for Disease Control and Prevention; 2023.Google Scholar

National Institute of Diabetes and Digestive and Kidney Disease. United States Renal Data System Progress through Research. Bethesda, MD: National Institute of Diabetes and Digestive and Kidney Disease; 2023.Google Scholar

Bollyky, TJ, Hulland, EN, Barber, RM, et al. Pandemic preparedness and COVID-19: an exploratory analysis of infection and fatality rates, and contextual factors associated with preparedness in 177 countries, from Jan 1, 2020, to Sept 30, 2021. The Lancet 2022;399:1489–1512.10.1016/S0140-6736(22)00172-6CrossRef Google Scholar

Perry, J. Trust in Public Institutions: Trends and Implications for Economic Security. New York: United Nations Department of Economic and Social Affairs; 2021.Google Scholar

Steel-Fisher, GK, Findling, MG, Caporello, HL, et al. Trust In US Federal, State, and Local Public Health Agencies during COVID-19: responses and policy implications: study reports the results of a survey of public trust in us federal, state, and local public health agencies’ performance during the COVID-19 pandemic. Health Affairs 2023;42:328–337.10.1377/hlthaff.2022.01204CrossRef Google Scholar

Kennedy, B, Tyson, A, Funk, C. Americans’ Trust in Scientists, Other Groups Declines. Washington, DC: Pew Research Center; 2022.Google Scholar

Figure 1. Key elements of a modernized public health data system to support future infectious diseases public health challenges.

Table 1. Pandemic Data Management: Challenges, Recommendations to Policymakers, and Examples

Article contents

SHEA position statement on pandemic preparedness for policymakers: pandemic data collection, maintenance, and release

Abstract

Background

Rationale

Recommendations

Data collection and harmonization

Data organization

Data availability and sharing

Balancing speed of data availability and accuracy and reproducibility

Multidisciplinary data management team

Forward-facing, not backward-looking, data collection

Transparency

Technological advancements and innovations

Sustainability

Summary

Acknowledgments

Financial support

Competing interests

Disclaimer

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests