Maintaining patient privacy is a common challenge faced by researchers seeking to understand the relationship of place and health [Reference Ajayakumar, Curtis and Curtis1–Reference Zandbergen4]. This issue can be especially problematic in multisite studies due to study protocols and confidentiality concerns that limit the sharing of geographic data. Existing approaches to geospatial analyses in multisite studies include the use of a central site to conduct analyses or the application of spatial techniques (e.g. changing geographic coordinates to protect confidentiality, i.e. “geomasking”) to protect patient privacy [Reference Zandbergen4]. However, the former method is limited by data use agreements (DUAs) and the latter may result in exposure misclassification when geomasked locations result in spatial misalignment.
Alternatively, study sites may perform geocoding and geospatial linkages independently before removing identifiable information for joint analyses. This decentralized approach, however, faces challenges in reproducibility and standardization due to geocoding methods, geographic information software (GIS), and expertise that varies across study sites. Here, we describe the application of a novel method to perform reproducible, standardized, and confidential geospatial analyses for multisite studies. Our approach extends a previously developed Decentralized Geomarker Assessment for Multi-site Studies (DeGAUSS) containerization platform to perform geocoding and extraction of polygon feature geospatial data over multiple time periods and large geographic areas [Reference Brokamp, Wolfe, Lingren, Harley and Ryan5]. As an example case, we use our approach to ascertaining US census tract-level information for participants enrolled in the Children’s Respiratory and Environmental Workgroup (CREW), a network of 12 birth cohorts each studying the development of allergy and asthma in childhood [Reference Gern, Jackson and Lemanske6].
Materials and Methods
Our approach was motivated by the CREW consortium, a network of birth cohorts recruited from 1980 to 2020. Information regarding participating cohorts, including eligibility criteria, study recruitment, and other methods, are published [Reference Gern, Jackson and Lemanske6] and in Supplementary Table 1. Due to its large sample size and geographic distribution, CREW provides a unique platform to examine environmental and community factors that contribute to the disproportionately higher burden of asthma-related morbidity and mortality among disadvantaged communities [Reference Beck, Moncrief and Huang7–Reference Akinbami, Moorman, Simon and Schoendorf10]. A CREW data sharing protocol and DUA were approved by the local IRB for each cohort. However, the DUA allowed only limited datasets to be shared and prohibited the distribution of identifying information, including addresses or geocodes.
Distributed Geospatial Analysis
We extended our DeGAUSS software to enable all CREW sites, including those with limited geospatial expertise, to derive spatio-temporal US census tract-level information for their participants at birth. A key advantage to DeGAUSS is the use of a software containerization platform to wrap necessary software, system dependencies, and geospatial data in a stand-alone package that will work the same regardless of its host environment [Reference Brokamp11]. Previously, we created DeGUASS and applied this tool in the Electronic Medical Records and Genomics (eMERGE) network in a proof-of-concept study [Reference Brokamp, Wolfe, Lingren, Harley and Ryan5]. Whereas our prior DeGAUSS software included only a geocoder and code to link geospatial coordinates to nearby roadways and one census tract variable, the CREW consortium required significantly expanded geographical and temporal data to be included for analyses. Therefore, a new custom DeGAUSS container containing decennial US Census data (described below), census tract polygon boundary files for the 1980, 1990, 2000, and 2010 census, and R code (R Foundation for Statistical Computing: Vienna, Austria; 2014) was created to merge census tract-level data to the geocoded locations of CREW birth addresses. Additional details regarding DeGAUSS and the CREW container are provided in the supplementary materials and online [12,13].
A flow diagram depicting the distributed approach to geospatial analyses for the CREW consortium is provided in Figure 1. The DeGAUSS container image was created by C.B. at a single location and distributed to each cohort. The DeGAUSS software required cohort users to provide an input.csv file containing the geocoded coordinates of their participants’ birth record address. Site end users also specified the census year to assign the appropriate tract boundary file and census data based on participants’ year of birth. The output data file from the DeGAUSS container contained the original input data, including geocoded locations, and appended census tract information including the census tract FIPS code in which the birth record address was located and census variables. Site end users manually removed identifying information, including the geographic coordinates and census tract FIPS code, prior to returning the de-identified dataset to a central coordinating center.
US Census/American Community Survey Data
Longitudinal US Census data and boundary files for the years 1980, 1990, 2000, and 2010 were downloaded for the entire USA from Social Explorer (www.socialexplorer.com, New York City, NY: Social Explorer 2017; accessed 12/17–1/18) by year, requested variables, and tract level of geography. For 1980, certain census variables were only available from the National Historic GIS data service (Minneapolis, MN: NHGIS; accessed 11/17–1/18) and were updated to reflect variable calculation definitions used in later censuses. As the 2010 census did not include information regarding median household income, median gross rent, or median housing values, these data were downloaded from the 2008–2012 American Community Survey (ACS, https://www.census.gov/programs-surveys/acs/data.html; accessed 12/17–1/18). Additional information regarding the census and ACS variables downloaded and included in the DeGAUSS container for linkage to birth record addresses is available in the supplementary material.
After sharing de-identified data with a central site, descriptive statistics and box-and-whisker plots for all census tract variables were calculated for the combined CREW consortium and for each cohort individually. Comparison of census tract-level to self-reported race, ethnicity, and household income was conducted by plotting the distribution of each census tract-level variable according to self-reported variable. Self-reported household income was compared to census tract median household income using each cohorts’ income categories as collected by questionnaire. Additional details regarding self-reported race, ethnicity, and income information are available in the supplementary material.
All cohorts (n = 12) completed the distributed analysis and returned de-identified data to the coordinating center. Collectively, 8997 participants were enrolled in CREW cohorts, and 98% (n = 8810) of these had birth record addresses geocoded with sufficient precision for linkage to a census tract.
A summary of population, race, ethnicity, and income data for the census tracts in which participants resided at birth is provided in Table 1. CREW participants resided in both low and high population density regions as reflected in the average tract population density (persons per km2) that ranged from 148 for Wisconsin Infant Study Cohort (WISC) participants in rural Wisconsin to 45,772 for CCCEH participants in New York City (Table 1). Overall, CREW participants lived in tracts that were 67% White, 23% Black, 2% Asian, and 9% Other race. There was, however, variability in tract racial distribution both within and across cohorts (Table 1); eight cohorts enrolled participants from tracts where more than 75% of the population was White, while participants enrolled in URECA, WHEALS, and CCCEH resided in census tracts with populations having a greater proportion of Black or Other race. Most cohorts enrolled participants from tracts having a Hispanic population less than 10%, though the mean Hispanic population in tracts of CCCEH, IIS, TCRS, and URECA (Boston, MA and New York, NY sites) participants ranged from 21 to 57%. The overall mean of households in poverty was 15% in census tracts where CREW participants resided at birth but as shown in Figure 2, there was significant variability within and across cohorts. Additional information on census tract-level median household income, percentage (%) of households in poverty, % occupied housing, and median housing value as indicators of neighborhood income and housing is provided in the supplementary materials.
CREW, Children’s Respiratory and Environmental Workgroup; CAS, Childhood Allergy and Asthma Study; COAST, Childhood Origins of Asthma Study; CCAAPS, Cincinnati Childhood Allergy and Air Pollution Study; CCCEH, Columbia Center for Children’s Environmental Health; EHAAS, Epidemiology of Home Allergens and Asthma Study; INSPIRE, Infant Susceptibility to Pulmonary Infections and Asthma Following RSV Exposure; IIS, Infant Immune Study; MAAP, Microbes, Allergy, Asthma, and Pets; TCRS, Tucson Children’s Respiratory Study; URECA, Urban Environment and Childhood Asthma; WHEALS, Wayne County Health, Environment, Allergy, and Asthma Longitudinal Study; WISC, Wisconsin Infant Study Cohort.
The distribution of tract-level race data (%White, %Black, %Asian, %Other) according to participants self-reported race is presented in Figure 3. Overall, participants who reported being White or Asian race lived in census tracts with majority White populations, whereas participants who reported Black race resided in census tracts with greater variability in shares of White and Black populations (Figure 3). Additional comparisons of individual-level to neighborhood-level income and ethnicity are provided in the supplementary material.
Environmental exposures and the community in which they occur are significant causes of human disease, including asthma [Reference Beck, Moncrief and Huang7,Reference Burbank, Sood, Kesic, Peden and Hernandez14–Reference Gupta, Zhang and Springston17]. Disentangling the environmental and social exposures that contribute to health disparities necessitates geographically and culturally diverse studies. The methods described here make the characterization of community characteristics in a confidential yet standardized and reproducible manner more feasible for researchers and policy-makers. Importantly, our method is generalizable to additional types of geographic data, including polygon and point data, allowing other studies to customize and incorporate geocoding and geospatial analyses into their approach.
Our distributed geospatial approach offers some important advantages compared to existing methods, including reduced exposure misclassification, maintaining participant confidentiality, and reducing the need for geospatial expertise at each study site [Reference Ajayakumar, Curtis and Curtis1]. One existing and alternative method to maintain subject confidentiality is the alteration of participants’ geographic coordinates. Referred to as “geomasking” or “jittering,” this approach involves either a random shift in the location of subjects or a systematic transformation of the locations known only to the researchers [Reference Zandbergen4]. However, this method may introduce errors or biases introduced due to the displacement of the participants’ actual location, particularly for analyses requiring an exact location. For example, the amount of geomasking required to make geographic datasets de-identified according to HIPAA standards may result in incorrect census tracts being linked to individual subjects resulting in exposure misclassification. An alternative approach to incorporating geospatial information into multisite studies is to obtain IRB approval and DUAs to share subjects’ identifiable information with a central site for geocoding and analysis. Challenges with this strategy include hesitation on the part of institutions to share identifiable information (e.g. addresses). Performing geospatial linkages at individual sites is another approach but may produce non-standardized and non-reproducible results due to the use of varying geocoding platforms, software, and dataset.
Our DeGAUSS method overcomes these limitations because the geospatial data and software are developed at a central site, ensuring that all individual study sites run the same software on identically constructed datasets. Importantly, our method also accounts for spatial, temporal, and informational changes in census tract boundaries and census data over the study period. Participants in CREW cohorts were born over nearly four decades, and, as detailed in the accompanying supplement, linking geocoded addresses with the appropriate census tract at the time of participant birth was critical to our approach. By including census tract boundaries and accompanying data from 1980, 1990, 2000, and 2010 in our DeGAUSS software, we were able to link study participants to the appropriate census tract and data for their date of birth. Of note, the DeGAUSS platform is flexible and amenable to additional geographic data and analyses, including derivation of nearby greenspace, distance to roadways and other locations, estimating gridded air pollution exposures, area-based indices of neighborhood deprivation, and others. Specific details and example uses are available on the DeGAUSS website [12,13].
As part of the NIH Environmental influences on Child Health Outcomes (ECHO) program, the objective of CREW is to understand the etiology of childhood asthma and determine environmental and genetic contributors. As such, CREW offers a unique opportunity to examine the significantly higher rates of asthma prevalence, hospitalization, and morbidity among children residing in households and neighborhoods with lower SES, as compared to White children residing in higher socioeconomic status (SES) households and communities [Reference Gold and Wright18–Reference Sullivan, Ghushchyan, Kavati, Navaratnam, Friedman and Ortiz20]. Multiple influences contribute to these disparities including disproportionately higher exposure to air pollution, poor housing, limited access to care, and other adverse physical and toxicant contributors, and inherited factors may increase individuals’ sensitivity to these [Reference Beck, Huang, Chundur and Kahn21–Reference Rauh, Landrigan and Claudio25]. However, environmental exposures and inherited factors alone do not fully explain the observed health disparities. For that reason, hardships linked to poverty, including discrimination, stress, family turmoil, violence, instability, and others have been posited to play an important role in the social patterning of disease [Reference Williams, Sternthal and Wright26]. These social determinants of health may be separate from access to medical care and are important drivers not only in asthma morbidity, but also in a wide range of both adult and pediatric health outcomes, including mortality [27,Reference Braveman and Gottlieb28]. The importance of socioeconomic factors is highlighted by observations that disparities in health outcomes across strata of SES remain present within racial/ethnic groups [Reference Braveman and Gottlieb28].
Our methods and results should be considered, however, in the context of some limitations. Census tract boundaries alone may not accurately describe individuals’ neighborhood experience and the use of decennial census data rather than ACS data resulted in reduced temporal precision but allowed us to increase our historic reach to 1980. There are also limitations to the US Census data including a lack of specificity in certain variables such as ethnicity. For example, Hispanic participants in the TCRS and IIS in Tucson, AZ, likely differ in ancestry from Hispanic participants from cohorts in New York City, NY. Finally, our approach requires expertise at individual institutions to obtain patient or electronic health records.
In conclusion, we demonstrated the use of a distributed approach to conduct geospatial analyses for the 12 CREW cohorts that is also applicable to other multisite studies. Future applications of our method will include additional gridded data including land cover, air pollution models, and meteorological information. Future research to understand the etiology of childhood asthma will incorporate longitudinal residential locations throughout childhood and multilevel analyses of individual and neighborhood environments.
To view supplementary material for this article, please visit https://doi.org/10.1017/cts.2021.7.
The Children’s Environment and Respiratory Workgroup (CREW) is a collaboration of the following cohorts, institutions, investigators, and staff (principal investigators are indicated by an asterisk). The authors would particularly like to thank Michael Evans, Andrew Moseby, Zhouwen Liu, Brent Olson, Suzanne Havstad, Soma Datta, Howard Andrews, and Agustin Calatroni for running the DeGAUSS software to link census and address data at their cohort sites.
Columbia Center for Children’s Environmental Health (CCCEH): Rachel Miller*, Howard Andrews, Julie Herbstman, Lori Hoepner, Frederica Perera, Matthew Perzanowski, Xinhua Liu, Judyth Ramirez, Janelle Rivera, Deliang Tang, Kylie Riley, and Jacqueline Jezioro.
Microbes, Allergy, Asthma and Pets (MAAP): Henry Ford Health System, Detroit, MI: E Zoratti*, CC Johnson, A Sitarik, S Havstad, K Woodcroft, A Levin, G Wegienka, B Davidson, S Finazzo, K Bobbitt, E Mann, S Bellemore, S Zhang, A Wahlman, and K Jones; Augusta University, Augusta, GA: D Ownby; University of Michigan, Ann Arbor, MI: N Lukacs; University of California, San Francisco, CA: S Lynch, H Boushey.
Tucson Children’s Respiratory Study (TCRS): Anne L. Wright*, Fernando D. Martinez*, Wayne Morgan, Debra A. Stern, Dean Billheimer, Brian Hallmark, Paloma Beamer, Nathan Lothrop, Lydia De La Ossa, Silvia Lopez, Marilyn Halonen, Amber Spangenberg, David Spies, and Lynn Taussig.
Infant Immune Study (IIS): Anne L. Wright*, Fernando D. Martinez*, Wayne Morgan,Debra A. Stern, Dean Billheimer, Brian Hallmark, Paloma Beamer, Nathan Lothrop, Lynn Taussig, Heidi Erickson, Marilyn Halonen, Amber Spangenberg, David Spies.
Wisconsin Infant Study Cohort (WISC): Marshfield Clinic Research Institute: Casper Bendixsen, Kathrine Barnes, Tara Johnson, Terry Foss, Elizabeth Armagost, Vicki Moon, Tammy Koepel, Erin Donnerbauer, Wayne Frome, and Brent Olson.
University of Wisconsin-Madison: Christine M. Seroogy*, James E. Gern*, Samantha Fye, Irene Ong, Deborah Chasman, Rose Vrtis, Heather Floerke, Amy Dresen, Yury Bochkov, Tressa Pappas, Kristine Grindle, Michael Evans, and Ronald Gangnon.
Childhood Origins of Asthma Study (COAST): Robert F. Lemanske, Jr.*, Daniel J. Jackson*, James E. Gern, Carole Ober, Anne Marie Singh, Ronald E. Gangnon, Michael D. Evans, Victoria Rajamanickam, Christopher Tisler, Lisa Salazar, Susan Doyle, Yury Bochkov, Rebecca Brockman-Schneider, Rose Vrtis, Kristine Grindle, Tressa Pappas, Elizabeth Anderson, Kathy Roberg, Kirsten Carlson-Dakes, Mark DeVries, Douglas DaSilva, Ronald Sorkness, Lance Mikus, and Julia Bach.
Urban Environment and Childhood Asthma Study (URECA): Johns Hopkins University, Baltimore, MD: R Wood*, E Matsui, H Lederman, F Witter, S Leimenstoll, D Scott, M Cootauco, and P Jones; Boston University School of Medicine, Boston, MA: G O’Connor*, W Cruikshank, M Sandel, A Lee-Parritz, C Jordan, E Gjerasi, P Price-Johnson, L Gagalis, L Wang, N Gonzalez, and M Tuzova; Harvard Medical School, Boston, MA – D Gold, R Wright; Columbia University, New York, NY: M Kattan*, C Lamm, N Whitney, P Yaniv, M Pierce, and Jaqueline Jezioro; Mount Sinai School of Medicine, New York, NY: H Sampson, R Sperling, and N Rivers; Washington University School of Medicine, St Louis, MO: G Bloomberg*, L Bacharier*, Y Sadovsky, E Tesson, C Koerkenmeier, R Sharp, K Ray, J Durrange, I Bauer, A Freie, and V Morgan; Statistical and Clinical Coordinating Center – Rho, Inc, Chapel Hill, NC: C Visness*, P Zook, M Yaeger, J Martin, A Calatroni, K Jaffee, W Taylor, R Budrevich, and H Mitchell; Scientific Coordination and Administrative Center, University of Wisconsin, Madison, WI: W Busse*, J Gern*, P Heinritz, C Sorkness, K. Hernandez, Y. Bochkov, K Grindle, A Dresen, T Pappas, M. Renneberg, and B. Stoffel; National Institute of Allergy and Infectious Diseases, Bethesda, MD: P Gergen, A Togias, E Smartt, and K Thompson.
Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS): Gurjit K. Khurana Hershey*, Patrick H. Ryan*, Jocelyn M. Biagini Myers*, Grace K. LeMasters*, Kristi Curtsinger, Liza Murrison*, Jeffrey W. Burkle, Christopher Wolfe, Zachary Flege, David Morgan, Kristina Keidel, Krista Tensing, and Taylor Groeschen.
The Infant Susceptibility to Pulmonary Infections and Asthma Following RSV Exposure (INSPIRE): Vanderbilt University Medical Center, Nashville, TN: Tina V. Hartert MD, MPH**, Andrew Abreo MD+, Niek Achten MD+, Alyssa Bednarek BS, Steven M. Brunwasser PhD*, Teresa M. Chipps BS, Alexandra Connolly BS, Kaitlin Costello BA, Marian Dorst BA+, William D. Dupont PhD*, Roxanne Filardo-Collins RN, BSN, Rebecca Gammell BA+, Tebeb Gebretsadik MPH*, Kayla Goodman, Emma Larkin PhD+, Jessica Levine RN, Zhouwen Liu MS, Christian Lynch MPH, Megan Mccollum MS, Patricia Minton RN, AE-C+, Paul E. Moore MD, Sarah Osmundson MD*, R. Stokes Peebles MD, Christian Rosas-Salazar MD, MPH, Cosby Stone Jr. MD, MPH+, Theresa Rogers RN+, Pat Russell RN, BSN, Kedir Turi PhD*, Kim B. Woodward MSN, APRN, CPNP-PC, Pingsheng Wu PhD, Vanderbilt Technologies for Advanced Genomics, Nashville, TN: Suman R. Das PhD, Seesandra Rajagopala PhD, Meghan H. Shilts MHS, MS, Vanderbilt Infectious Diseases, Nashville, TN: James D. Chappell MD, Emory University, Atlanta, GA: Larry J. Anderson MD, Tatiana Chirkova PhD, Martin L. Moore PhD.
The Epidemiology of Home Allergens and Asthma Study (EHAAS): Diane R. Gold, Soma Datta, Sharon O’Toole, Conner Fleurat, Leanna Farnham.
Wayne County Health, Environment, Allergy and Asthma Longitudinal Study (WHEALS): CC Johnson*, G Wegienka, S Havstad, E Zoratti, A Cassidy-Bushrow, A Levin, H Kim, K Woodcroft, A Sitarik, C Joseph, LK Williams, C Barone, K Bobbitt, S Zhang, J Campbell, K Bourgeois, M Aubuchon, J Ezell, K Jones, D Ownby.
Childhood Allergy Study (CAS): D Ownby,* CC Johnson,* C Joseph, E Zoratti,* G Wegienka, S Havstad, K Woodcroft, E Peterson, S Hensley Alford, J McCullough, C Strauchman Boyer, S Blocki, G Birg, N Akkerman, K Wells, S Zhang, C Nicholas, A Jones, and G Stouffer.
Administrative Center, University of Wisconsin-Madison: James E. Gern,* Gina Crisafi, Dorothy Floerke, and Rick Kelley.
Coordinating Center, Rho, Inc. Federal Systems Division: Cynthia Visness*, Melissa Yaeger, Samara Dixon, and Kathy Collier.
Biomedical Informatics and Biostatistical Core (BBC), University of Wisconsin-Madison: Umberto Tachinardi*, Mark Craven*, Eneida Mendonca, Lisa Gress, Adam Nunez, and Laura Ladick; University of Manchester: Philip Couch, Camille Johnson, John Ainsworth, Victoria Turner; Imperial College, London, UK, Adnan Custovic.
Genetics Core, University of Chicago. Carole Ober*, Daniel Nicolae.
Geospatial Core, Harvard School of Public Health, Boston, MA: Diane Gold*, Heike Gibson, Brent Coull, Antonella Zanobetti, Weeberb Requia, Joel Schwartz, Itai Kloog, Qian Di, Peter James, Marcia Jimenez Pescador, and Jaime Hart.
Microbiome Core, University of California-San Francisco: Susan Lynch* and Kathrine McCauley.
This manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
CREW is funded by HHS/NIH grant 5UG3OD023282. Additional support was provided by individual cohorts’ grants/contracts:
Columbia University: P01 ES09600, R01 ES008977, P30 ES09089, R01 ES013163, R827027.
Tucson Children’s Respiratory Study (TCRS): R01HL132523, R01HL56177, K25HL103970.
Infant Immune Study (IIS): R01AI42268, K25HL103970.
Childhood Origins of Asthma Study (COAST): P01 HL070831, U10 HL064305, R01 HL061879.
Urban Environment and Childhood Asthma Study (URECA): NO1-AI-25496, NO1-AI-25482, HHSN272200900052C, HHSN272201000052I, NCRR/NIH RR00052, M01RR00533, 1UL1RR025771, M01RR00071, 1UL1RR024156, UL1TR001079, 5UL1RR024992-02, NCATS/NIH UL1TR000040.
Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS): R01 ES11170, R01 ES019890.
The Epidemiology of Home Allergens and Asthma Study (EHAAS): R01 AI035786.
Wayne County Health, Environment, Allergy and Asthma Longitudinal Study (WHEALS): R01 AI050681, R56 AI050681, R01 AI061774, R21 AI059415, K01 AI070606, R21 AI069271, R01 HL113010, R21 ES022321, P01 AI089473, R21 AI080066, R01 AI110450, R01 HD082147 and the Fund for Henry Ford Health System.
Childhood Allergy Study (CAS): R01 AI024156, R03 HL067427, R01 AI051598, Blue Cross Foundation Johnson, and the Fund for Henry Ford Hospital.
Microbes, Allergy, Asthma and Pets (MAAP): P01 AI089473 and Fund for Henry Ford Hospital.
Infant Susceptibility to Pulmonary Infections and Asthma following RSV Exposure (INSPIRE): NIH/NIAID U19 AI 095227, NIH/NCATS UL1 TR 002243, NIH/NIAID K24 AI 077930, NIH/NHLBI R21 HD 087864, NIH/NHLBI X01 HL 134583.
Wisconsin Infant Study Cohort (WISC): U19 AI104317, NCATS UL1TR000427, the charitable donors to the Marshfield Clinic Health System Foundation, and the Upper Midwest Agricultural Safety and Health Center (UMASH) U54 OH010170.
The authors have no conflicts of interest to declare.