Keyword extraction from emails*

S. LAHIRI; R. MIHALCEA; P.-H. LAI

doi:10.1017/S1351324916000231

Keyword extraction from emails*

Published online by Cambridge University Press: 09 September 2016

S. LAHIRI ,

R. MIHALCEA and

P.-H. LAI

Show author details

S. LAHIRI: Affiliation:
University of Michigan, Ann Arbor, MI, USA48109 e-mail: lahiri@umich.edu, mihalcea@umich.edu
R. MIHALCEA: Affiliation:
University of Michigan, Ann Arbor, MI, USA48109 e-mail: lahiri@umich.edu, mihalcea@umich.edu
P.-H. LAI: Affiliation:
Samsung Research America, Richardson, TX, USA75082 e-mail: s.lai@sra.samsung.com

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Emails constitute an important genre of online communication. Many of us are often faced with the daunting task of sifting through increasingly large amounts of emails on a daily basis. Keywords extracted from emails can help us combat such information overload by allowing a systematic exploration of the topics contained in emails. Existing literature on keyword extraction has not covered the email genre, and no human-annotated gold standard datasets are currently available. In this paper, we introduce a new dataset for keyword extraction from emails, and evaluate supervised and unsupervised methods for keyword extraction from emails. The results obtained with our supervised keyword extraction system (38.99% F-score) improve over the results obtained with the best performing systems participating in the SemEval 2010 keyword extraction task.

Type: Articles
Information: Natural Language Engineering , Volume 23 , Issue 2 , March 2017 , pp. 295 - 317

DOI: https://doi.org/10.1017/S1351324916000231 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

We are grateful to the annotators who made this work possible. This material is based in part upon work supported by Samsung Research America under agreement GN0005468 and by the National Science Foundation under IIS award #1018613. Any opinions, findings, conclusions or recommendations expressed above are those of the authors and do not necessarily reflect the views of Samsung Research America or the National Science Foundation. We also thank the anonymous reviewers whose insightful comments helped improve the draft substantially.

References

Batagelj, V., and Zaveršnik, M. 2003. An O(m) algorithm for cores decomposition of networks. CoRR cs.DS/0310049, 1–10.Google Scholar

Berend, G., 2011. Opinion expression mining by exploiting keyphrase extraction. In Proceedings of 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand: Asian Federation of Natural Language Processing, pp. 1162–1170.Google Scholar

Berend, G., and Farkas, R. 2010. SZTERGAK: feature engineering for keyphrase extraction. In Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden.Google Scholar

Blei, D. M., Ng, A. Y., and Jordan, M. I., 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 : 993–1022.Google Scholar

Boudin, F. 2013. A comparison of centrality measures for graph-based keyphrase extraction. In Proceedings of the 6th International Joint Conference on Natural Language Processing, Nagoya, Japan.Google Scholar

Chuang, J., Manning, C. D., and Heer, J. 2012. ‘Without the clutter of unimportant words’: descriptive keyphrases for text visualization. ACM Transactions on Computer-Human Interaction 19 (3): 19:1–19:29.CrossRef Google Scholar

Clear, J. H. 1993. The British national corpus. In Landow, G. P., and Delany, P. (eds.), The Digital Word, pp. 163–187. Cambridge, MA, USA: MIT Press.Google Scholar

Csomai, A., and Mihalcea, R., 2007. Investigations in unsupervised back-of-the-book indexing. In FLAIRS Conference, Columbus, Ohio, USA, pp. 211–216.Google Scholar

Csomai, A., and Mihalcea, R. 2008. Linguistically motivated features for enhanced back-of-the-book indexing. In McKeown, K., Moore, J. D., Teufel, S., Allan, J., and Furui, S. (eds.), ACL, Key West, Florida, USA, pp. 932–940.Google Scholar

Dredze, M., Wallach, H. M., Puller, D., and Pereira, F. 2008. Generating summary keywords for emails using topics. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI ’08). ACM, New York, NY, USA, pp. 199–206.Google Scholar

Ferrara, F., Pudota, N., and Tasso, C. 2011. A keyphrase-based paper recommender system. In Agosti, R., Esposito, F., Meghini, C., and Orio, N. (eds.), Digital Libraries and Archives, Communications in Computer and Information Science, Vol. 249. pp. 14–25. Berlin Heidelberg: Springer.CrossRef Google Scholar

Finkel, J. R., Grenager, T., and Manning, C., 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the Association for Computational Linguistics, ACL ’05, Ann Arbor, Michigan, USA, pp. 363–370.CrossRef Google Scholar

Goodman, Joshua, and Carvalho, Vitor R. 2005. Implicit Queries for Email. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS). July. Stanford, California, USA.Google Scholar

Grineva, M., Grinev, M., and Lizorkin, D., 2009. Extracting key terms from noisy and multi-theme documents. In Proceedings of the 18th International World Wide Web Conference, WWW 2009, Madrid, Spain, pp. 661–670.CrossRef Google Scholar

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H., 2009. The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11 (1): 10–18.CrossRef Google Scholar

Hasan, K. S., and Ng, V., 2010. Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, pp. 365–373.Google Scholar

Hasan, K. S., and Ng, V. 2014. Automatic keyphrase extraction: a survey of the state of the art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, pp. 1262–1273.Google Scholar

Hulth, A., 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP ’03, Sapporo, Japan, pp. 216–223.CrossRef Google Scholar

Jiang, X., Hu, Y., and Li, H., 2009. A ranking approach to keyphrase extraction. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Boston, Massachusetts, USA, pp. 756–757.CrossRef Google Scholar

Kim, S. N., Medelyan, O., Kan, M.-Y., and Baldwin, T., 2010. SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 21–26.Google Scholar

Kleinberg, J. M., 1999. Authoritative sources in a hyperlinked environment. J. ACM 46 (5): 604–632.CrossRef Google Scholar

Klimt, B., and Yang, Y. 2004. Introducing the enron corpus. In Proceedings of the 1st Conference on Email and Anti-Spam (CEAS), Mountain View, California, USA.Google Scholar

Laclavík, M., and Maynard, D. 2009. Motivating intelligent e-mail in business: an investigation into current trends for e-mail processing and communication research. In IEEE Conference on Commerce and Enterprise Computing. CEC ’09, Vienna, Austria.CrossRef Google Scholar

Lee, S., and Kim, H.-J., 2008. News keyword extraction for topic tracking. In Proceedings of the 2008 4th International Conference on Networked Computing and Advanced Information Management - Volume 02, NCM ’08, Washington, DC, USA: IEEE Computer Society, pp. 554–559.CrossRef Google Scholar

Li, Z., Zhou, D., Juan, Y.-F., and Han, J., 2010. Keyword extraction for social snippets. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, Raleigh, North Carolina, USA, pp. 1143–1144.CrossRef Google Scholar

Litvak, M., and Last, M., 2008. Graph-based keyword extraction for single-document summarization. In Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, MMIES ’08, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 17–24.CrossRef Google Scholar

Liu, F., Pennell, D., Liu, F., and Liu, Y., 2009. Unsupervised approaches for automatic keyword extraction using meeting transcripts. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, Boulder, Colorado, USA, pp. 620–628.CrossRef Google Scholar

Liu, Z., Huang, W., Zheng, Y., and Sun, M., 2010. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10, MIT, Massachusetts, USA, pp. 366–376.Google Scholar

Loza, V., Lahiri, S., Mihalcea, R., and Lai, P.-H. 2014. Building a dataset for summarization and keyword extraction from emails. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA), Reykjavik, Iceland, pp. 26–31.Google Scholar

Mihalcea, R., and Csomai, A., 2007. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management, CIKM ’07, Lisboa, Portugal, pp. 233–242.CrossRef Google Scholar

Mihalcea, R., and Tarau, P. 2004. TextRank: bringing order into texts. In Lin, D., and Wu, D. (eds.), Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 404–411.Google Scholar

Nguyen, T. D., and Kan, M.-Y., 2007. Keyphrase extraction in scientific publications. In Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers, ICADL’07, Hanoi, Vietnam, pp. 317–326.CrossRef Google Scholar

Page, L., Brin, S., Motwani, R., and Winograd, T., 1998. The PageRank citation ranking: bringing order to the web. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, pp. 161–172.Google Scholar

Phan, X.-H. 2006. CRFTagger: CRF English POS Tagger.Google Scholar

Pianta, E., and Tonelli, S. 2010. KX: a flexible system for keyphrase eXtraction. In Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden.Google Scholar

Seidman, S. B., 1983. Network structure and minimum degree. Social Networks 5 (3): 269–287.CrossRef Google Scholar

Tomokiyo, T., and Hurst, M., 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment-Volume 18, Association for Computational Linguistics, Sapporo, Japan, pp. 33–40.CrossRef Google Scholar

Tonella, P., Ricca, F., Pianta, E., and Girardi, C., 2003. Using keyword extraction for web site clustering. In Proceedings of the 5th IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture, Amsterdam, The Netherlands, pp. 41–48.Google Scholar

Turney, P. D., 2000. Learning algorithms for keyphrase extraction. Information Retrieval 2 (4): 303–336.CrossRef Google Scholar

Wan, X., and Xiao, J., 2008. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2, AAAI’08, AAAI Press, Chicago, Illinois, USA, pp. 855–860.Google Scholar

Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., and Nevill-Manning, C. G., 1999. KEA: practical automatic keyphrase extraction. In Proceedings of the 4th ACM Conference on Digital Libraries, DL ’99, Berkeley, California, USA, pp. 254–255.Google Scholar

Yih, W.-tau, Goodman, J., and Carvalho, V. R., 2006. Finding advertising keywords on web pages. In Proceedings of the 15th International Conference on World Wide Web, WWW ’06, New York, NY, USA: ACM, pp. 213–222.CrossRef Google Scholar

Article contents

Keyword extraction from emails*

Abstract

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests