Hostname: page-component-76fb5796d-22dnz Total loading time: 0 Render date: 2024-04-25T17:08:06.625Z Has data issue: false hasContentIssue false

Google Books Ngrams and Political Science: Two Validity Tests for a Novel Data Source

Published online by Cambridge University Press:  24 October 2019

Sean Richey
Affiliation:
Georgia State University
J. Benjamin Taylor
Affiliation:
Kennesaw State University

Abstract

Google Books Ngrams data are freely available and contain billions of words used in tens of millions of digitized books, which begin in the 1500s for some languages. We explore the benefits and pitfalls of these data by showing examples from comparative and American politics. Specifically, we show how usage of the phrase “political corruption” in Italian, French, German, and Hebrew books strongly correlates with Transparency International’s well-cited Corruption Index for France, Italy, German, and Israel. We also use Ngrams to show that the explosive growth in usage of the phrases “Asian American,” “Latino,” and “Hispanic” correlates with real-world changes in these populations after the Immigration and Nationality Act of 1965. These applications show that Ngram data correlate strongly with similar data from well-respected sources. This suggests that Ngrams has content validity and can be used as a proxy measure for previously difficult-to-research phenomena and questions.

Type
Article
Copyright
Copyright © American Political Science Association 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Bentley, R. Alexander, Garnett, Philip, O’Brien, Michael J., and Brock, William A.. 2012. “Word Diffusion and Climate Science.” PLoS ONE 7 (11): e47966.CrossRefGoogle ScholarPubMed
Brown, Peter F., Della Pietra, Vincent J., deSouza, Peter V., Lai, Jenifer C., and Mercer, Robert L.. 1992. “Class-Based N-Gram Models of Natural Language.” Computational Linguistics 18 (4): 467–79.Google Scholar
Cavnar, William B., and Trenkle, John M.. 1994. “N-Gram–Based Text Categorization.” In Proceedings of SDAIR-94, Las Vegas, NV, 161–75.Google Scholar
Chen, Yunsong, and Yan, Fei. 2016. “Centuries of Sociology in Millions of Books.” The Sociological Review. Available at http://doi.wiley.com/10.1111/1467-954X.12399.CrossRefGoogle Scholar
Ferrante, Joan, and Brown, Prince Jr. 2001. The Social Construction of Race and Ethnicity in the United States, second edition. Upper Saddle River, NJ: Prentice Hall.Google Scholar
Golden, Miriam A., and Picci, Lucio. 2005. “Proposal for a New Measure of Corruption, Illustrated with Italian Data.” Economics & Politics 17 (1): 3775.CrossRefGoogle Scholar
Greenfield, Patricia M. 2013. “The Changing Psychology of Culture from 1800 through 2000.” Psychological Science 24 (9): 1722–31.CrossRefGoogle ScholarPubMed
Hassanpour, Navid. 2013. “Tracking the Semantics of Politics: A Case for Online Data Research in Political Science.” PS: Political Science & Politics 46 (2): 299306.Google Scholar
King, Gary, Lam, Patrick, and Roberts, Margaret E.. 2017. “Computer‐Assisted Keyword and Document Set Discovery from Unstructured Text.” American Journal of Political Science 61 (4): 971–88.CrossRefGoogle Scholar
Koplenig, Alexander. 2017. “The Impact of Lacking Metadata for the Measurement of Cultural and Linguistic Change Using the Google Ngram Datasets—Reconstructing the Composition of the German Corpus in Times of WWII.” Digital Scholarship in the Humanities 32 (1): 169–88.Google Scholar
Lancaster, Thomas D., and Montinola, Gabriella R.. 1997. “Toward a Methodology for the Comparative Study of Political Corruption.” Crime, Law and Social Change 27 (3–4): 185206.CrossRefGoogle Scholar
Lin, Yuri, Michel, Jean-Baptiste, Lieberman, Erez Aiden, Orwant, Jon, Brockman, Will, and Petrov, Slav. 2012. “Syntactic Annotations for the Google Books Ngram Corpus.” In Proceedings of the ACL 2012 System Demonstrations, ACL ’12, Stroudsburg, PA: Association for Computational Linguistics, 169174. Available at http://dl.acm.org/citation.cfm?id=2390470.2390499 (accessed March 21, 2018).Google Scholar
Manovich, Len. 2012. “Trending: The Promises and Challenges of Big Social Data.” In Debates in the Digital Humanities, ed. Gold, Matthew K., 460–75. Minneapolis: University of Minnesota Press.CrossRefGoogle Scholar
Michel, Jean-Baptiste, et al. 2011. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science 331 (6014): 176–82.CrossRefGoogle ScholarPubMed
Ophir, Shai. 2010. “A New Type of Historical Knowledge.” The Information Society 26 (2): 144–50.CrossRefGoogle Scholar
Orwant, Jon. 2012. “Ngram Viewer 2.0.” Google Research Blog. Available at https://research.googleblog.com/2012/10/ngram-viewer-20.html (accessed July 19, 2016).Google Scholar
Pechenick, Eitan Adam, Danforth, Christopher M., and Dodds, Peter Sheridan. 2015. “Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution.” PLOS ONE 10 (10): e0137041.CrossRefGoogle ScholarPubMed
Roth, Steffen. 2014. “Fashionable Functions: A Google Ngram View of Trends in Functional Differentiation (1800–2000).” International Journal of Technology and Human Interaction 10 (2): 3458.CrossRefGoogle Scholar
Shea, Daniel M., and Sproveri, Alex. 2012. “The Rise and Fall of Nasty Politics in America.” PS: Political Science & Politics 45 (3): 416–21.Google Scholar
Smedley, Audrey, and Smedley, Brian D.. 2005. “Race as Biology Is Fiction, Racism as a Social Problem Is Real: Anthropological and Historical Perspectives on the Social Construction of Race.” American Psychologist 60 (1): 1626.CrossRefGoogle ScholarPubMed
The Authors Guild v. Google, Inc. 2016. 136 S. Ct. (US Supreme Court).Google Scholar
Zeng, Rong, and Greenfield, Patricia M.. 2015. “Cultural Evolution over the Last 40 Years in China: Using the Google Ngram Viewer to Study Implications of Social and Political Change for Cultural Values: Cultural Evolution in China.” International Journal of Psychology 50 (1): 4755.CrossRefGoogle Scholar
Supplementary material: PDF

Richey and Taylor supplementary material

Web Appendix

Download Richey and Taylor supplementary material(PDF)
PDF 158.3 KB