Using Word Order in Political Text Classification with Long Short-term Memory Models

Charles Chang; Michael Masterson

doi:10.1017/pan.2019.46

Using Word Order in Political Text Classification with Long Short-term Memory Models

Published online by Cambridge University Press: 23 December 2019

Charles Chang

and

Michael Masterson

Show author details

Charles Chang: Affiliation:
Postdoctoral Associate, The Council on East Asian Studies, Yale University, New Haven, CT06511, USA Postdoctoral Associate, Center on Religion and Chinese Society, Purdue University, West Lafayette, IN47907, USA. Email: charles.chang@yale.edu
Michael Masterson*: Affiliation:
PhD Candidate, Political Science at the University of Wisconsin–Madison, Madison, WI53706, USA. Email: masterson2@wisc.edu
*: *Email: masterson2@wisc.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Political scientists often wish to classify documents based on their content to measure variables, such as the ideology of political speeches or whether documents describe a Militarized Interstate Dispute. Simple classifiers often serve well in these tasks. However, if words occurring early in a document alter the meaning of words occurring later in the document, using a more complicated model that can incorporate these time-dependent relationships can increase classification accuracy. Long short-term memory (LSTM) models are a type of neural network model designed to work with data that contains time dependencies. We investigate the conditions under which these models are useful for political science text classification tasks with applications to Chinese social media posts as well as US newspaper articles. We also provide guidance for the use of LSTM models.

Keywords

statistical analysis of texts machine learning computational methods Automated content analysis

Type: Articles
Information: Political Analysis , Volume 28 , Issue 3 , July 2020 , pp. 395 - 411

DOI: https://doi.org/10.1017/pan.2019.46 [Opens in a new window]
Copyright: Copyright © The Author(s) 2019. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Contributing Editor: Daniel Hopkins

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., and Devin, M. et al. . 2016. “Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.” Preprint, arXiv:1603.04467.Google Scholar

Baum, M. A., Cohen, D. K., and Zhukov, Y. M.. 2018. “Does Rape Culture Predict Rape? Evidence from U.S. Newspapers, 2000–2013.” Quarterly Journal of Political Science (QJPS) 13(3):263–289.CrossRef Google Scholar

Beck, N., King, G., and Zeng, L.. 2000. “Improving Quantitative Studies of International Conflict: A Conjecture.” The American Political Science Review 94(1):21–35.CrossRef Google Scholar

Bengio, Y. 2009. “Learning Deep Architectures for AI.” Foundations and Trends in Machine Learning 2(1):1–127.CrossRef Google Scholar

Benoit, K., Conway, D., Lauderdale, B. E., Laver, M., and Mikhaylov, S.. 2016. “Crowd-Sourced Text Analysis: Reproducible and Agile Production of Political Data.” American Political Science Review 110(2):278–295.CrossRef Google Scholar

Bird, S., Klein, E., and Loper, E.. 2009. Natural Language Processing with Python. Sebastopol, CA: O’Reilly Media, Inc.Google Scholar

Burscher, B., Vliegenthart, R., and De Vreese, C. H.. 2015. “Using Supervised Machine Learning to Code Policy Issues: Can Classifiers Generalize across Contexts?” The ANNALS of the American Academy of Political and Social Science 659(1):122–131.CrossRef Google Scholar

Carlson, D., and Montgomery, J. M.. 2017. “A Pairwise Comparison Framework for Fast, Flexible, and Reliable Human Coding of Political Texts.” American Political Science Review 111(4):835–843.CrossRef Google Scholar

Chang, C., and Masterson, M.. 2019. “Replication Data for: Using Word Order in Political Text Classification with Long Short-Term Memory Models.” https://doi.org/10.7910/DVN/MRVKIR, Harvard Dataverse, V1.CrossRef Google Scholar

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P.. 2002. “SMOTE: Synthetic Minority Over-Sampling Technique.” Journal of Artificial Intelligence Research 16:321–357.CrossRef Google Scholar

Chollet, F. et al. . 2015. Keras. GitHub. https://github.com/fchollet/keras.Google Scholar

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y.. 2014. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling.” Preprint, arXiv:1412.3555.Google Scholar

Cortez, P., and Embrechts, M. J.. 2013. “Using Sensitivity Analysis and Visualization Techniques to Open Black Box Data Mining Models.” Information Sciences 225:1–17.CrossRef Google Scholar

Diermeier, D., Godbout, J.-F., Yu, B., and Kaufmann, S.. 2012. “Language and Ideology in Congress.” British Journal of Political Science 42(01):31–55.CrossRef Google Scholar

D’Orazio, V., Landis, S. T., Palmer, G., and Schrodt, P.. 2014. “Separating the Wheat from the Chaff: Applications of Automated Document Classification Using Support Vector Machines.” Political Analysis 22(2):224–242.CrossRef Google Scholar

Elman, J. 1990. “Finding Structure in Time.” Cognitive Science 14(2):179–211.CrossRef Google Scholar

Fernández, A., Garcia, S., Herrera, F., and Chawla, N. V.. 2018. “Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary.” Journal of Artificial Intelligence Research 61:863–905.CrossRef Google Scholar

Friedman, J., Hastie, T., and Tibshirani, R.. 2001. The Elements of Statistical Learning, vol. 1 (Springer Series in Statistics). New York: Springer.Google Scholar

Gers, F. A., Schmidhuber, J., and Cummins, F.. 2000. “Learning to forget: Continual prediction with LSTM.” Neural Comput. 12(10):2451–2471.CrossRef Google Scholar PubMed

Geurts, P., Ernst, D., and Wehenkel, L.. 2006. “Extremely randomized trees.” Machine Learning 63(1):3–42.CrossRef Google Scholar

Google. 2017. Vector Representation of Words. www.tensorflow.org/tutorials/word2vec.Google Scholar

Graves, A., and Schmidhuber, J.. 2005. “Framewise Phoneme Classification With Bidirectional LSTM and Other Neural Network Architectures.” Neural Networks 18(5-6):602–610.CrossRef Google Scholar PubMed

Greff, K., Srivastava, R. K., Koutník, Jan, Steunebrink, B. R., and Schmidhuber, J.. 2016. “LSTM: A Search Space Odyssey.” IEEE Transactions on Neural Networks and Learning Systems 28(10):2222–2232.CrossRef Google Scholar PubMed

Grimmer, J., and Stewart, B. M.. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21(03):267–297.CrossRef Google Scholar

Han, R., Gill, M., Spirling, A., and Cho, K.. 2018. “Conditional Word Embedding and Hypothesis Testing via Bayes-by-Backprop.” Working Paper.CrossRef Google Scholar

Hochreiter, S. 1998. “The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions.” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(02):107–116.CrossRef Google Scholar

Hochreiter, S., and Schmidhuber, J.. 1997a. “Long Short-Term Memory.” Neural Computation 9(8):1735–1780.CrossRef Google Scholar

Hochreiter, S., and Schmidhuber, J.. 1997b. “LSTM Can Solve Hard Long Time Lag Problems.” In Advances in Neural Information Processing Systems, edited by Mozer, M. C., Jordan, M. I., and Petsche, T., 473–479. Cambridge, MA: MIT Press.Google Scholar

Hopkins, D. J., and King, G.. 2010. “A Method of Automated Nonparametric Content Analysis for Social Science.” American Journal of Political Science 54(1):229–247.CrossRef Google Scholar

James, G., Witten, D., Hastie, T., and Tibshirani, R.. 2013. An Introduction to Statistical Learning, vol. 112. New York: Springer.CrossRef Google Scholar

Jernite, Y., Grave, E., Joulin, A., and Mikolov, T.. 2017. “Variable Computation in Recurrent Neural Networks.” In 5th International Conference on Learning Representations, ICLR 2017.Google Scholar

Johnson, R., and Zhang, T.. 2016.“Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings.” Preprint, arXiv:1602.02373.Google Scholar

Krebs, R. R. 2015. “How Dominant Narratives Rise and Fall: Military Conflict, Politics, and the Cold War Consensus.” International Organization 69(04):809–845.CrossRef Google Scholar

Liang, H., Sun, X., Sun, Y., and Gao, Y.. 2017. “Text Feature Extraction Based on Deep Learning: A Review.” EURASIP Journal on Wireless Communications and Networking 2017(1):211.CrossRef Google Scholar PubMed

Liu, X., Wu, J., and Zhou, Z.. 2009. “Exploratory Undersampling for Class-Imbalance Learning.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39(2):539–550.Google Scholar PubMed

Lucas, C., Nielsen, R. A., Roberts, M. E., Stewart, B. M., Storer, A., and Tingley, D.. 2015. “Computer-Assisted Text Analysis for Comparative Politics.” Political Analysis 23(2):254–277.CrossRef Google Scholar

Maknickiene, N., and Maknickas, A.. 2012. Application of Neural Network for Forecasting of Exchange Rates and Forex Trading. Vilnius Gediminas Technical University Publishing House Technika, 122–127.Google Scholar

Mikolov, T., Chen, K., Corrado, G., and Dean, J.. 2013a, “Efficient Estimation of Word Representations in Vector Space.” Preprint, arXiv:1301.3781.Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J.. 2013b. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems, edited by Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., 3111–3119. Cambridge, MA: MIT Press.Google Scholar

Mnih, A., and Kavukcuoglu, K.. 2013. “Learning Word Embeddings Efficiently with Noise-Contrastive Estimation.” In Advances in Neural Information Processing Systems, edited by Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., 2265–2273. Cambridge, MA: MIT Press.Google Scholar

Montgomery, J. M., and Olivella, S.. 2018. “Tree-Based Models for Political Science Data.” American Journal of Political Science 62(3):729–744.CrossRef Google Scholar

Olden, J. D., and Jackson, D. A.. 2002. “Illuminating the Black Box: a Randomization Approach for Understanding Variable Contributions in Artificial Neural Networks.” Ecological Modelling 154(1):135–150.CrossRef Google Scholar

Osowski, S., Siwek, K., and Markiewicz, T.. 2004. “Mlp and SVM Networks-a Comparative Study.” In Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004, 37–40. Espoo, Finland: IEEE.Google Scholar

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.. 2011. “Scikit-learn: Machine Learning in Python.” Journal of Machine Learning Research 12:2825–2830.Google Scholar

Prechelt, L. 1998. “Early Stopping-But When?” In Neural Networks: Tricks of the Trade, 55–69. Heidelberg: Springer.CrossRef Google Scholar

Ruizendaal, R.2017. “Deep Learning #4: Why You Need to Start Using Embedding Layers” Towards Data Science, July 17. https://towardsdatascience.com/deep-learning-4-embedding-layers-f9a02d55ac12.Google Scholar

Spirling, A. 2012. “US Treaty Making With American Indians: Institutional Change and Relative Power, 1784–1911.” American Journal of Political Science 56(1):84–97.CrossRef Google Scholar

Sun, J.2015. Jieba. Version 0.38. https://github.com/fxsjy/jieba.Google Scholar

TensorFlow. 2018. Vector Representations of Words. https://www.tensorflow.org/tutorials/representation/word2vec.Google Scholar

Theano Development Team. 2017. LSTM Networks for Sentiment Analysis. http://deeplearning.net/tutorial/lstm.html.Google Scholar

Weiss, G. M. 2004. “Mining with Rarity: a Unifying Framework.” ACM Sigkdd Explorations Newsletter 6(1):7–19.CrossRef Google Scholar

Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., and Lipson, H.. 2015. “Understanding neural networks through deep visualization.” Preprint, arXiv:1506.06579.Google Scholar

Chang and Masterson supplementary material

File 1.2 MB

Article contents

Using Word Order in Political Text Classification with Long Short-term Memory Models

Abstract

Keywords

Access options

Footnotes

References

Chang and Masterson supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests