Skip to main content Accessibility help

Corpus-based dictionaries for sentiment analysis of specialized vocabularies

  • Douglas R. Rice (a1) and Christopher Zorn (a2)


Contemporary dictionary-based approaches to sentiment analysis exhibit serious validity problems when applied to specialized vocabularies, but human-coded dictionaries for such applications are often labor-intensive and inefficient to develop. We demonstrate the validity of “minimally-supervised” approaches for the creation of a sentiment dictionary from a corpus of text drawn from a specialized vocabulary. We demonstrate the validity of this approach in estimating sentiment from texts in a large-scale benchmarking dataset recently introduced in computational linguistics, and demonstrate the improvements in accuracy of our approach over well-known standard (nonspecialized) sentiment dictionaries. Finally, we show the usefulness of our approach in an application to the specialized language used in US federal appellate court decisions.


Corresponding author

*Corresponding author. Email:


Hide All

All materials necessary to replicate the results reported herein are posted to the Political Science Research and Methods Dataverse.



Hide All
Black, R, Treul, S, Johnson, T and Goldman, J (2011) Emotions, oral arguments, and Supreme Court decision making. Journal of Politics 73, 572581.
Black, R, Hall, M, Owens, R and Ringsmuth, E (2016) The role of emotional language in briefs before the US Supreme Court. Journal of Law & Courts 4, 377407.
Bryan, A and Ringsmuth, E (2016) Jeremiad or weapon of words?: the power of emotive language in Supreme Court dissents. Journal of Law & Courts 4, 159185.
Caldeira, G and Zorn, C (1998) Of time and consensual norms in the Supreme Court. American Journal of Political Science 42, 874902.
Danelski, D (1960) The influence of the chief justice in the decisional process of the Supreme Court. In Paper Presented at the Annual Meeting of the Midwest Political Science Association, Chicago, Illinois.
Dave, K, Lawrence, S and Pennock, D (2003) Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In 12th International World Wide Web Conference.
Epstein, L, Landes, W and Posner, R (2011) Why (and when) judges dissent: a theoretical and empirical analysis. Journal of Legal Analysis 3, 101137.
Finkelman, P (2006) Biographical Encyclopedia of the Supreme Court: The Lives and Legal, Chapter Roger Brook Taney, Washington, DC: CQ Press, pp. 531541.
Gerner, D, Schrodt, P, Francisco, R and Weddle, J (1994) The analysis of political events using machine coded data. International Studies Quarterly 38, 91119.
Grimmer, J and Stewart, B (2013) Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21, 267297.
Hansen, L, Arvidsson, A, Nielsen, F, Colleoni, E and Etter, M (2011) Good friends, bad news—affect and virality in twitter. In The 2011 International Workshop on Social Computing, Network, and Services (SocialComNet).
Haynie, S (1992) Leadership and consensus on the U.S. Supreme Court. Journal of Politics 54, 11581169.
Hendershot, M, Hurwitz, M, Lanie, D and Pacelle, R (2013) Dissensual decision making: revisiting the demise of consensual norms with the U.S. Supreme Court. Political Research Quarterly 66, 467481.
Liu, B (2010) Sentiment analysis and subjectivity. In Indurkya, N and Damerau, F (eds). Handbook of Natural Language Processing, 2nd Edn. Boca Raton, FL: Chapman and Hall/CRC Press, pp. 627666.
Maas, A, Daly, R, Pham, P, Huang, D, Ng, A and Potts, C (2011) Learning word vectors for sentiment analysis. In The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
Mikolov, T, Chen, K, Corrado, G and Dean, J (2013a) Efficient estimation of word representation in vector space. In ICLR Workshop.
Mikolov, T, Sutskever, I, Chen, K, Corrado, G and Dean, J (2013b) Distributed representation of words and phrases and their compositionality. In NIPS.
Nematzadeh, A, Meylan, S and Griffiths, T (2017) Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. In Proceedings of the 39th Annual Meeting of the Cognitive Science Society.
Nielsen, F (2011) A new anew: evaluation of a word list for sentiment analysis in microblogs. In The ESQ2011 Workshop on “Making Sense of Microposts”.
Pang, B and Lee, L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the Association for Computational Linguistics, pp. 271278.
Pang, B, Lee, L and Vaithyanathan, S (2002) Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7986.
Pennebaker, J, Francis, M and Booth, R (2001) Linguistic Inquiry and Word Count: LIWC2001. Mahwah, NJ: Erlbaum Publishers.
Pennington, J, Socher, R and Manning, CD (2014) Glove: global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pp. 15321543.
Pratt, W (1999) The Supreme Court Under Edward Douglass White, 1910–1921. Columbia, SC: University of South Carolina Press.
Quinn, K, Monroe, B, Crespin, M, Colaresi, M and Radev, D (2010) How to analyze political attention with minimal assumptions and costs. American Journal of Political Science 54, 209228.
Rice, D (2017) Issue divisions and U.S. Supreme Court decision making. Journal of Politics 79, 210222.
Rise, E (2006) Biographical Encyclopedia of the Supreme Court: The Lives and Legal, Chapter Harold Hitz Burton, Washington, DC: CQ Press, pp. 100104.
Salamone, M (2013) Judicial consensus and public opinion: conditional response to Supreme Court majority size. Political Research Quarterly 67, 320334.
Selivanov, D (2016) text2vec: Modern Text Mining Framework for R. R package version 0.4.0.
Spaeth, HJ, Epstein, L, Ruger, TW, Whittington, KE, Segal, JA and Martin, AD (2012) The Supreme Court database.
Stephenson, DG (1973) The chief justice as leader: the case of morrison waite. William and Mary Law Review 14, 899927.
Tang, D, Wei, F, Yang, N, Zhou, M, Liu, T and Qin, B (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland, pp. 1555–1565. Association for Computational Linguistics.
Tang, D, Wei, F, Qin, B, Yang, N, Liu, T and Zhou, M (2016) Sentiment embeddings with applications to sentiment analysis. Knowledge and Data Engineering, IEEE Transactions on 28, 496509.
Turney, P (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In 40th Annual Meeting of the Association for Computational Linguistics, pp. 417424.
Uszkoreit, H, Xu, F and Li, H (2009) Analysis and improvement of minimally supervised machine learning for relation extraction. In NLDB09 Proceedings of the 14th International Conference on Applications of Natural Language to Information Systems.
Walker, T, Epstein, L and Dixon, W (1988) On the mysterious demise of consensual norms in the United States Supreme Court. Journal of Politics 50, 361389.
Wang, P and Domeniconi, C (2008) Building semantic kernels for text classification using wikipedia. In 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 713721.
Zink, J, Spriggs, J and Scott, J (2009) Courting the public: the influence of decision attributes on individuals' views of court opinions. Journal of Politics 71, 909925.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Science Research and Methods
  • ISSN: 2049-8470
  • EISSN: 2049-8489
  • URL: /core/journals/political-science-research-and-methods
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Type Description Title
Supplementary materials

Rice and Zorn Dataset

Supplementary materials

Rice and Zorn supplementary material
Rice and Zorn supplementary material 1

 PDF (165 KB)
165 KB


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed