Skip to main content Accessibility help
×
×
Home

What's in a Name? A Method for Extracting Information about Ethnicity from Names

  • J. Andrew Harris (a1)

Abstract

Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data'if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.

    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      What's in a Name? A Method for Extracting Information about Ethnicity from Names
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      What's in a Name? A Method for Extracting Information about Ethnicity from Names
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      What's in a Name? A Method for Extracting Information about Ethnicity from Names
      Available formats
      ×

Copyright

This is an Open-Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

References

Hide All
Ambekar, A., Ward, C., Mohammed, J., Male, S., and Skiena, S. 2009. Name-ethnicity classification from open sources. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 49–58. KDD ‘09, New York, NY, USA: ACM.
Anderson, D., and Lochery, E. 2008. Violence and exodus in Kenya's Rift Valley, 2008: Predictable and preventable. Journal of Eastern African Studies 2(2): 328–43.
Brown, S., and Sriram, C. L. 2012. The big fish won't fry themselves: Criminal accountability for post-election violence in Kenya. African Affairs 111(443): 244–60.
Byrne, K., and O’Malley, E. 2012. What's in a name? Using surnames as data for party research. Party Politics 19(6): 985–97.
Coldman, A. J., Braun, T., and Gallagher, R. P. 1988. The classification of ethnic status using name information. Journal of Epidemiology and Community Health 42:390–95.
Cook, R. D. 1977. Detection of influential observation in linear regression. Technometrics 19(1): 1518.
Electoral Commission of Kenya. October 2007. Register of electors. Kuresoi Constituency.
Enos, R. D. 2011. What tearing down public housing projects teaches us about the effect of racial threat on political participation. Working Paper, Department of Government, Harvard University.
Enos, R. D. 2012. Testing the elusive: A field experiment on racial threat. Working Paper, Harvard University.
Enos, R. D. 2015. Forthcoming. What the demolition of public housing teaches us about the impact of racial threat on political behavior. American Journal of Political Science.
Goldfarb, D., and Idnani, A. 1982. Dual and primal-dual methods for solving strictly convex quadratic programs. In Numerical Analysis, ed. Hennart, J. P., 226–39. Berlin: Springer-Verlag.
Goldfarb, D., and Idnani, A. 1983. A numerically stable dual method for solving strictly convex quadratic programs. Mathematical Programming 27:133.
Greiner, D. J. 2007. Ecological inference in voting rights act disputes: Where are we now, and where do we want to be? Jurimetrics 47:115–67.
Greiner, D. J., and Quinn, K. M. 2009. R x C ecological inference: Bounds, correlations, flexibility, and transparency of assumptions. Journal of the Royal Statistical Society, Series A 172(1): 6781.
Grofman, B., and Garcia, J. 2014. Using Spanish surname to estimate Hispanic voting population in voting rights litigation: A model of context effects. Election Law Journal 13(3): 375–93.
Harris, J. A. 2014. Replication data for: What's in a name? A method for extracting information about ethnicity from names. Dataverse Network, doi:10.7910/DVN/27691 (v1).
He, H., and Garcia, E. 2009. Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions 21(9): 1263–84.
Hopkins, D. J. 2010. Politicized places: Explaining where and when immigrants provoke local opposition. American Political Science Review 104(1): 4060.
Hopkins, D., and King, G. 2010. A method of automated nonparametric content analysis for social science. American Journal of Political Science 54(1): 229–47.
Interim Independent Electoral Commission. July 2010. Voter's register. Kuresoi Constituency.
Kasara, K. 2013. Separate and suspicious: Local social and political context and ethnic tolerance in Kenya. Journal of Politics 75(4): 921–36.
King, G., and Lu, Y. 2008. Verbal autopsy methods with multiple causes of death. Statistical Science 23(1): 7891.
Klopp, J., and Kamungi, P. 2007. Violence and elections: Will Kenya collapse? World Policy Journal 24(4): 1118.
Mateos, P. 2007. A review of name-based ethnicity classification methods and their potential in population studies. Population, Space, and Place 13(4): 243–63.
Mateos, P. 2011. Uncertain segregation: The challenge of defining and measuring ethnicity in segregation studies. Built Environment 37(2): 226–38.
Mueller, S. D. 2014. Kenya and the International Criminal Court: Politics, the election, and the law. Journal of Eastern African Studies 8(1): 2542.
NCSBOE. 2012a Voter statistics file.
NCSBOE. 2012b Voting history file.
Rosenwaike, I. 1994. Surname analysis as a means of estimating minority elderly: An application using Asian surnames. Research on Aging 16(2): 212–27.
Sun, Y., Wong, A. K. C., and Kamel, M. S. 2009. Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence 23(4): 687719.
Susewind, R. 2015. What's in a name? Probabilistic inference of religious community from South Asian names. Field Methods 27(3): 114.
Treeratpituk, P., and Giles, C. L. 2012. Name-ethnicity classification and ethnicity-sensitive name matching. Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence.
Turlach, B. A., and Weingessel, A. 2013. quadprog: Functions to solve quadratic programming problems.
UNSD. Ethnicity: A review of data collection and dissemination. Technical report, United Nations Statistics Division, Demographic and Social Statistics Branch, Social and Housing Statistics Section.
Waki, J. P. 2008. Report of the Commission of Inquiry into Post Election Violence. Nairobi, Kenya: Government Printer.
Word, D., Coleman, C., Nunziata, R., and Kominski, R. n.d.a. Data accompanying “Demographic Aspects of Surnames from Census 2000.” U.S. Census Bureau, Washington, DC.
Word, D., Coleman, C., Nunziata, R., and Kominski, R. n.d.a. Data accompanying “Demographic Aspects of Surnames from Census n.d.b. Demographic aspects of surnames from Census 2000. Technical report, U.S. Census Bureau, Washington, DC.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
MathJax
Type Description Title
PDF
Supplementary materials

Harris supplementary material
Appendix

 PDF (233 KB)
233 KB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed