Hostname: page-component-76fb5796d-vfjqv Total loading time: 0 Render date: 2024-04-26T21:08:47.900Z Has data issue: false hasContentIssue false

Topic Classification for Political Texts with Pretrained Language Models

Published online by Cambridge University Press:  08 March 2023

Yu Wang*
Affiliation:
University of Rochester, Rochester, NY, USA. E-mail: w.y@alum.urmc.rochester.edu
*
Corresponding author Yu Wang

Abstract

Supervised topic classification requires labeled data. This often becomes a bottleneck as high-quality labeled data are expensive to acquire. To overcome the data scarcity problem, scholars have recently proposed to use cross-domain topic classification to take advantage of preexisting labeled datasets. Cross-domain topic classification only requires limited annotation in the target domain to verify its cross-domain accuracy. In this letter, we propose supervised topic classification with pretrained language models as an alternative. We show that language models fine-tuned with 70% of the small annotated dataset in the target corpus could outperform models trained using large cross-domain datasets by 27% and that models fine-tuned with 10% of the annotated dataset could already outperform the cross-domain classifiers. Our models are competitive in terms of training time and inference time. Researchers interested in supervised learning with limited labeled data should find our results useful. Our code and data are publicly available.1

Type
Letter
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Jeff Gill

1

The replication materials (Wang 2023) are available at the Political Analysis dataverse site.

References

Bestvater, S. E., and Monroe, B. L.. 2022. “Sentiment Is Not Stance: Target-Aware Opinion Classification for Political Text Analysis.” Political Analysis.Google Scholar
Clark, K., Luong, M.-T., Le, Q. V., and Manning, C. D.. 2020. ELECTRA: Pre-Training Text Encoders as Discriminators Rather than Generators. ICLR.Google Scholar
Cocco, J. D., and Monechi, B.. 2022. “How Populist Are Parties? Measuring Degrees of Populism in Party Manifestos using Supervised Machine Learning.” Political Analysis 30 (3): 311327.10.1017/pan.2021.29CrossRefGoogle Scholar
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of NAACL-HLT, 41714186.Google Scholar
Herrmann, M., and Döring, H.. 2021. “Party Positions from Wikipedia Classifications of Party Ideology.” Political Analysis 31: 2241.10.1017/pan.2021.28CrossRefGoogle Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R.. 2020. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. ICLR.Google Scholar
Liu, Y., et al. 2019. “RoBERTa: A Robustly Optimized BERT Pretraining Approach.” Preprint, arXiv:1907.11692.Google Scholar
Longpre, S., Wang, Y., and DuBois, C.. 2020. “How Effective Is Task-Agnostic Data Augmentation for Pretrained Transformers?” In Findings of the Association for Computational Linguistics: EMNLP 2020.10.18653/v1/2020.findings-emnlp.394CrossRefGoogle Scholar
Osnabrügge, M., Ash, E., and Morelli, M.. 2021. “Cross-Domain Topic Classification for Political Texts.” Political Analysis 31: 5980.10.1017/pan.2021.37CrossRefGoogle Scholar
Stone, R., Wang, Y., and Yu, S.. 2022. “Chinese Power and the State-Owned Enterprise.” International Organization 76 (1): 229250.10.1017/S0020818321000308CrossRefGoogle Scholar
Vaswani, A., et al. 2017. “Attention Is All You Need.” In 31st Conference on Neural Information Processing Systems.Google Scholar
Wang, Y. 2023. “Replication Data for: Topic Classification for Political Texts with Pretrained Language Models.” Harvard Dataverse, V1. https://doi.org/10.7910/DVN/FMT8KR CrossRefGoogle Scholar
Wang, Y., Li, Y., and Luo, J.. 2016. “Deciphering the 2016 U.S. Presidential Campaign in the Twitter Sphere: A Comparison of the Trumpists and Clintonists.” In Proceedings of the Tenth International AAAI Conference on Web and Social Media.Google Scholar
Zhang, T., Wu, F., Katiyar, A., Weinberger, K. Q., and Artzi, Y.. 2021. Revisiting Few-Sample BERT Fine-Tuning. ICLR.Google Scholar
Supplementary material: PDF

Wang supplementary material

Online Appendix

Download Wang supplementary material(PDF)
PDF 142.9 KB
Supplementary material: Link
Link