Hostname: page-component-797576ffbb-58z7q Total loading time: 0 Render date: 2023-12-08T03:09:06.177Z Has data issue: false Feature Flags: { "corePageComponentGetUserInfoFromSharedSession": true, "coreDisableEcommerce": false, "useRatesEcommerce": true } hasContentIssue false

Misclassification and Bias in Predictions of Individual Ethnicity from Administrative Records

Published online by Cambridge University Press:  15 May 2023

Brigham Young University, United States
Brigham Young University, United States
Lisa P. Argyle, Assistant Professor, Department of Political Science, Brigham Young University, United States,
Michael Barber, Associate Professor, Department of Political Science, Brigham Young University,


We show that a common method of predicting individuals’ race in administrative records, Bayesian Improved Surname Geocoding (BISG), produces misclassification errors that are strongly correlated with demographic and socioeconomic factors. In addition to the high error rates for some racial subgroups, the misclassification rates are correlated with the political and economic characteristics of a voter’s neighborhood. Racial and ethnic minorities who live in wealthy, highly educated, and politically active areas are most likely to be misclassified as white by BISG. Inferences about the relationship between sociodemographic factors and political outcomes, like voting, are likely to be biased in models using BISG to infer race. We develop an improved method in which the BISG estimates are incorporated into a machine learning model that accounts for class imbalance and incorporates individual and neighborhood characteristics. Our model decreases the misclassification rates among non-white individuals, in some cases by as much as 50%.

© The Author(s), 2023. Published by Cambridge University Press on behalf of the American Political Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Adjaye-Gbewonyo, Dzifa, Bednarczyk, Robert A., Davis, Robert L., and Omer, Saad B.. 2014. “Using the Bayesian Improved Surname Geocoding Method (BISG) to Create a Working Classification of Race and Ethnicity in a Diverse Managed Care Population: A Validation Study.” Health Services Research 49 (1): 268–83.CrossRefGoogle Scholar
Ansolabehere, Stephen, Fraga, Bernard L., and Schaffner, Brian F.. 2020. “The CPS Voting and Registration Supplement Overstates Minority Turnout.” Journal of Politics 84 (3): 1850–5.Google Scholar
Argyle, Lisa P., and Barber, Michael. 2023. “Replication Data for: Misclassification and Bias in Predictions of Individual Ethnicity from Administrative Records.” Harvard Dataverse. Dataset. Scholar
Baines, Arthur P., and Courchane, Marsha J.. 2014. “Fair Lending: Implications for the Indirect Auto Finance Market.” Study Prepared for the American Financial Services Association.Google Scholar
Craig, Maureen A., and Richeson, Jennifer A.. 2018. “Majority No More? The Influence of Neighborhood Racial Diversity and Salient National Population Changes on Whites’ Perceptions of Racial Discrimination.” RSF: The Russell Sage Foundation Journal of the Social Sciences 4 (5): 141–57.CrossRefGoogle Scholar
Curiel, John, and Dagonel, Angelo. 2020. “Wisconsin Election Analysis.” Stanford-MIT Healthy Elections Project 6: 2020–08.Google Scholar
Edwards, Frank, Esposito, Michael H., and Lee, Hedwig. 2018. “Risk of Police-Involved Death by Race/Ethnicity and Place, United States, 2012–2018.” American Journal of Public Health 108 (9): 1241–8.CrossRefGoogle Scholar
Edwards, Frank, Lee, Hedwig, and Esposito, Michael. 2019. “Risk of Being Killed by Police Use of Force in the United States by Age, Race–Ethnicity, and Sex.” Proceedings of the National Academy of Sciences 116 (34): 16793–98.CrossRefGoogle ScholarPubMed
Einstein, Katherine Levine, Glick, David M., and Palmer, Maxwell. 2019. Neighborhood Defenders: Participatory Politics and America’s Housing Crisis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Elliott, Marc N., Morrison, Peter A., Fremont, Allen, McCaffrey, Daniel F., Pantoja, Philip, and Lurie, Nicole. 2009. “Using the Census Bureau’s Surname List to Improve Estimates of Race/Ethnicity and Associated Disparities.” Health Services and Outcomes Research Methodology 9 (2): 69–83.CrossRefGoogle Scholar
Enos, Ryan D. 2016. “What the Demolition of Public Housing Teaches Us about the Impact of Racial Threat on Political Behavior.” American Journal of Political Science 60 (1): 123–42.CrossRefGoogle Scholar
Enos, Ryan D., Kaufman, Aaron R., and Sands, Melissa L.. 2019. “Can Violent Protest Change Local Policy Support? Evidence from the Aftermath of the 1992 Los Angeles Riot.” American Political Science Review 113 (4): 1012–28.CrossRefGoogle Scholar
Fiscella, Kevin, and Fremont, Allen M.. 2006. “Use of Geocoding and Surname Analysis to Estimate Race and Ethnicity.” Health Services Research 41 (4p1): 1482–500.Google ScholarPubMed
Fraga, Bernard, and Holbein, John. 2020. “Measuring Youth and College Student Voter Turnout.” Electoral Studies 65: 102086. Scholar
Fraga, Bernard L. 2018. The Turnout Gap: Race, Ethnicity, and Political Inequality in a Diversifying America. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Grinberg, Nir, Joseph, Kenneth, Friedland, Lisa, Swire-Thompson, Briony, and Lazer, David. 2019. “Fake News on Twitter during the 2016 US Presidential Election.” Science 363 (6425): 374–78.CrossRefGoogle ScholarPubMed
Grumbach, Jacob M., and Sahn, Alexander. 2020. “Race and Representation in Campaign Finance.” American Political Science Review 114 (1): 206–21.CrossRefGoogle Scholar
Henninger, Phoebe, Meredith, Marc, and Morse, Michael. 2018. “Who Votes without Identification? Using Affidavits from Michigan to Learn about the Potential Impact of Strict Photo Voter Identification Laws.” Working Paper.CrossRefGoogle Scholar
Herring, Cedric, and Henderson, Loren. 2016. “Wealth Inequality in Black and White: Cultural and Structural Sources of the Racial Wealth Gap.” Race and Social Problems 8 (1): 417.CrossRefGoogle Scholar
Hofstra, Bas, and de Schipper, Niek C.. 2018. “Predicting Ethnicity with First Names in Online Social Media Networks.” Big Data & Society 5 (1). Scholar
Imai, Kosuke, and Khanna, Kabir. 2016. “Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records.” Political Analysis 24 (2): 263–72.CrossRefGoogle Scholar
Martino, Steven C., Weinick, Robin M., Kanouse, David E., Brown, Julie A., Haviland, Amelia M., Goldstein, Elizabeth, Adams, John L., et al. 2013. “Reporting CAHPS and HEDIS Data by Race/Ethnicity for Medicare Beneficiaries.” Health Services Research 48 (2pt1): 417–34.CrossRefGoogle Scholar
Nguyen, Vy T., Zafonte, Ross D., Chen, Jarvis T., Kponee-Shovein, Kalé Z., Paganoni, Sabrina, Pascual-Leone, Alvaro, Speizer, Frank E., et al. 2019. “Mortality among Professional American-Style Football Players and Professional American Baseball Players.” JAMA Network Open 2 (5): e194223. ScholarPubMed
Thomas, Timothy Andrew. 2017. “Forced Out: Race, Market, and Neighborhood Dynamics of Evictions.” PhD diss. Department of Sociology, University of Washington.Google Scholar
Voicu, Ioan. 2018. “Using First Name Information to Improve Race and Ethnicity Classification.” Statistics and Public Policy 5 (1): 113.CrossRefGoogle Scholar
Wolfinger, Raymond E., and Rosenstone, Steven J.. 1980. Who Votes? New Haven, CT: Yale University Press.Google Scholar
Supplementary material: Link

Argyle and Barber Dataset

Supplementary material: PDF

Argyle and Barber supplementary material

Argyle and Barber supplementary material

Download Argyle and Barber supplementary material(PDF)