Skip to main content Accessibility help
×
Home

A novel ILP framework for summarizing content with high lexical variety

  • WENCAN LUO (a1), FEI LIU (a2), ZITAO LIU (a3) and DIANE LITMAN (a1)

Abstract

Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system’s ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word cooccurrence matrix to intrinsically group semantically similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.

Copyright

Footnotes

Hide All

*This research is supported by an internal grant from the Learning Research and Development Center at the University of Pittsburgh as well as by an Andrew Mellon Predoctoral Fellowship to the first author. We are grateful to Logan Lebanoff for helping with the experiments. We also thank Muhsin Menekse, the CourseMIRROR team, and Wenting Xiong for providing or helping to collect some of our datasets. We thank Jingtao Wang, Fan Zhang, Huy Nguyen, and Zahra Rahimi for valuable suggestions about the proposed summarization algorithm.

Footnotes

References

Hide All
Almeida, M., and Martins, A., 2013. Fast and robust compressive summarization with dual decomposition and multi-task learning. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Sofia, Bulgaria: Association for Computational Linguistics, pp. 196206.
Barzilay, R., McKeown, K. R., and Elhadad, M. 1999. Information fusion in the context of multi-document summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
Berg-Kirkpatrick, T., Gillick, D., and Klein, D. 2011. Jointly learning to extract and compress. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
Boud, D., Keogh, R., Walker, D., et al. (2013). Reflection: Turning Experience into Learning. Routledge, UK.
Boudin, F., Mougard, H., and Favre, B. 2015. Concept-based summarization using integer linear programming: From concept pruning to multiple optimal solutions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1914–1918.
Cao, Z., Wei, F., Li, W., and Li, S. 2018. Faithful to the original: Fact aware neural abstractive summarization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
Carbonell, J., and Goldstein, J. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’98), New York, NY, USA: ACM, pp. 335–336.
Celikyilmaz, A., Bosselut, A., He, X., and Choi, Y., 2018. Deep communicating agents for abstractive summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), New Orleans, Louisiana: Association for Computational Linguistics, pp. 16621675.
Chen, Q., Zhu, X., Ling, Z.-H., Wei, S., and Jiang, H. 2016. Distraction-based neural networks for document summarization. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI).
Cho, K. (2008). Machine classification of peer comments in physics. In Educational Data Mining 2008–1st International Conference on Educational Data Mining, Proceedings, pp. 192–196.
Cohan, A., Dernoncourt, F., Kim, D. S., Bui, T., Kim, S., Chang, W., and Goharian, N. 2018. A discourse-aware attention model for abstractive summarization of long documents. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). Association for Computational Linguistics, pp. 615–621.
Cohan, A., and Goharian, N., 2015. Scientific article summarization using citation-context and article’s discourse structure. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics, pp. 390400.
Cohan, A., and Goharian, N. 2016. Revisiting summarization evaluation for scientific articles. In Chair, N. C. C., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S. (eds.), Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Paris, France: European Language Resources Association (ELRA).
Cohan, A., and Goharian, N. 2017. Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries, pp. 1–17.
Conroy, J., and Davis, S., 2015. Vector space models for scientific document summarization. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, Denver, Colorado: Association for Computational Linguistics, pp. 186191.
Conroy, J., Davis, S. T., Kubina, J., Liu, Y.-K., O’Leary, D. P., and Schlesinger, J. D., 2013. Multilingual summarization: Dimensionality reduction and a step towards optimal term coverage. In Proceedings of the MultiLing Workshop on Multilingual Multi-Document Summarization, Sofia, Bulgaria: Association for Computational Linguistics, pp. 5563.
Dang, H. T., and Owczarzak, K. 2008. Overview of the TAC 2008 update summarization task. In Proceedings of Text Analysis Conference (TAC).
Durrett, G., Berg-Kirkpatrick, T., and Klein, D., 2016. Learning-based single-document summarization with compression and anaphoricity constraints. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany: Association for Computational Linguistics, pp. 19982008.
Erkan, G., and Radev, D. R., 2004. LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22 (1): 457479.
Fan, X., Luo, W., Menekse, M., Litman, D., and Wang, J. 2015. CourseMIRROR: Enhancing large classroom instructor-student interactions via mobile interfaces and natural language processing. In Proceedings of the Works-In-Progress of ACM Conference on Human Factors in Computing Systems. ACM.
Fan, X., Luo, W., Menekse, M., Litman, D., and Wang, J. 2017. Scaling reflection prompts in large classrooms via mobile interfaces and natural language processing. In Proceedings of 22nd ACM Conference on Intelligent User Interfaces (IUI 2017).
Galanis, D., Lampouras, G., and Androutsopoulos, I. 2012. Extractive multi-document summarization with integer linear programming and support vector regression. In Proceedings of Computational Linguistics (COLING).
Gerani, S., Mehdad, Y., Carenini, G., Ng, R. T., and Nejat, B. 2014. Abstractive summarization of product reviews using discourse structure. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
Gillick, D., and Favre, B. 2009. A scalable global model for summarization. In Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing. Association for Computational Linguistics, pp. 10–18.
Gillick, D., Favre, B., and Hakkani-Tür, D. 2008. The ICSI summarization system at TAC 2008. In Proceedings of Text Analysis Conference (TAC).
Gkatzia, D., Hastie, H., Janarthanam, S., and Lemon, O. 2013. Generating student feedback from time-series data using reinforcement learning. In Proceedings of European Workshop on Natural Language Generation (ENLG).
Goldberg, Y., and Levy, O. 2014. word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embedding method. preprint arXiv:1402.3722.
Goodfellow, I., Bengio, Y., and Courville, A. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org
Graham, Y., 2015. Re-evaluating automatic summarization with bleu and 192 shades of rouge. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics, pp. 128137.
Grusky, M., Naaman, M., and Artzi, Y. 2018. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 708–719. Association for Computational Linguistics, New Orleans, Louisiana.
Guo, H., Pasunuru, R., and Bansal, M. 2018. Soft, layer-specific multi-task summarization with entailment and question generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia.
Harwood, W. S., 1996. The one minute paper: A communication tool for large lecture classes. Journal of Chemical Education 73 (3): 229.
He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., and He, X. 2012. Document summarization based on data reconstruction. In Proceedings of AAAI.
Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., and Blunsom, P. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems, pp. 1693–1701.
Hong, K., Conroy, J., Favre, B., Kulesza, A., Lin, H., and Nenkova, A. 2014. A repository of state of the art and competitive baseline summaries for generic news summarization. In Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., and Piperidis, S. (eds.), Proceedings of LREC, Reykjavik, Iceland: ACL Anthology Identifier: L14-1070, pp. 1608–1616.
Jindal, N., and Liu, B. 2008. Opinion spam and analysis. In Proceedings of the International Conference on Web Search and Data Mining. ACM, pp. 219–230.
Jing, H., and McKeown, K. 1999. The decomposition of human-written summary sentences. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
Kikuchi, Y., Neubig, G., Sasano, R., Takamura, H., and Okumura, M. 2016. Controlling output length in neural encoder-decoders. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP).
Kulesza, A., and Taskar, B. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning 5 (2–3): 123286.
Lee, J.-H., Park, S., Ahn, C.-M., and Kim, D., 2009. Automatic generic document summarization based on non-negative matrix factorization. Information Processing & Management 45 (1): 2034.
Li, C., Liu, F., Weng, F., and Liu, Y., 2013. Document summarization via guided sentence compression. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA: Association for Computational Linguistics, pp. 490500.
Li, C., Liu, Y., Liu, F., Zhao, L., and Weng, F. 2014. Improving multi-document summarization by sentence compression based on expanded constituent parse tree. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), Doha, Qatar.
Li, C., Liu, Y., and Zhao, L., 2015. Using external resources and joint learning for bigram weighting in ILP-based multi-document summarization. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado: Association for Computational Linguistics, pp. 778787.
Liao, K., Lebanoff, L., and Liu, F. 2018. Abstract meaning representation for multi-document summarization. In Proceedings of the International Conference on Computational Linguistics (COLING), Santa Fe, New Mexico, USA.
Lin, C.-Y. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out, vol. 8. Barcelona, Spain.
Lin, H., and Bilmes, J. 2010. Multi-document summarization via budgeted maximization of submodular functions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 912–920.
Liu, F., and Liu, Y., 2013. Towards abstractive speech summarization: Exploring unsupervised and supervised approaches for spoken utterance compression. IEEE Transactions on Audio, Speech and Language Processing 21 (7): 14691480.
Luo, W., Fan, X., Menekse, M., Wang, J., and Litman, D., 2015. Enhancing instructor-student and student-student interactions with mobile interfaces and summarization. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, Denver, Colorado: Association for Computational Linguistics, pp. 1620.
Luo, W., and Litman, D., 2015. Summarizing student responses to reflection prompts. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics, pp. 19551960.
Luo, W., Liu, F., and Litman, D., 2016a. An improved phrase-based approach to annotating and summarizing student course responses. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING-2016), Osaka, Japan. The COLING 2016 Organizing Committee, pp. 5363.
Luo, W., Liu, F., Liu, Z., and Litman, D., 2016b. Automatic summarization of student course feedback. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California: Association for Computational Linguistics, pp. 8085.
Martins, A., and Smith, N. A., 2009. Summarization with a joint model for sentence extraction and compression. In Proceedings of the Workshop on Integer Linear Programming for NLP, Boulder, Colorado, pp. 19.
Mazumder, R., Hastie, T., and Tibshirani, R., 2010. Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research 11 (Aug): 22872322.
Menekse, M., Stump, G., Krause, S. J., and Chi, M. T. 2011. The effectiveness of students daily reflections on learning in engineering context. In Proceedings of the American Society for Engineering Education (ASEE) Annual Conference, Vancouver, Canada.
Moon, S., Potdar, S., and Martin, L. 2014. Identifying student leaders from mooc discussion forums through language influence. In Proceedings of EMNLP Workshop on Analysis of Large Scale Social Interaction in MOOCs.
Mosteller, F., 1989. The ‘muddiest point in the lecture’ as a feedback device. On Teaching and Learning: The Journal of the Harvard-Danforth Center 3: 1021.
Nallapati, R., Xiang, B., and Zhou, B. 2016. Sequence-to-sequence RNNs for text summarization. CoRR, abs/1602.06023.
Narayan, S., Cohen, S. B., and Lapata, M., 2018. Ranking sentences for extractive summarization with reinforcement learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), New Orleans, Louisiana: Association for Computational Linguistics, pp. 17471759.
Nenkova, A., McKeown, K., et al. 2011. Automatic summarization. Foundations and Trends® in Information Retrieval 5 (2–3): 103233.
Paul, O., and James, Y. 2004. An introduction to DUC-2004. In Proceedings of the 4th Document Understanding Conference (DUC-2004).
Paulus, R., Xiong, C., and Socher, R. 2017. A deep reinforced model for abstractive summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
Qazvinian, V., Radev, D. R., Mohammad, S. M., Dorr, B., Zajic, D., Whidby, M., and Moon, T., 2013. Generating extractive summaries of scientific paradigms. Journal of Artificial Intelligence Research 46: 165201.
Qian, X., and Liu, Y., 2013. Fast joint compression and summarization via graph cuts. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA: Association for Computational Linguistics, pp. 14921502.
Radev, D. R., Jing, H., Styś, M., and Tam, D., 2004. Centroid-based summarization of multiple documents. Information Processing & Management 40 (6): 919938.
Rankel, P. A. 2016. Statistical Analysis of Text Summarization Evaluation. PhD thesis, University of Maryland, College Park.
Ranzato, M., Chopra, S., Auli, M., and Zaremba, W. 2016. Sequence level training with recurrent neural networks. In Proceedings of the International Conference on Learning Representations (ICLR).
Ren, P., Wei, F., Chen, Z., Ma, J., and Zhou, M., 2016. A redundancy-aware sentence regression framework for extractive summarization. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING-2016), Osaka, Japan. The COLING 2016 Organizing Committee, pp. 3343.
Rose, C. P., and Siemens, G. 2014. Shared task on prediction of dropout over time in massively open online courses. In Proceedings of EMNLP Workshop on Analysis of Large Scale Social Interaction in MOOCs.
Rus, V., Lintean, M. C., Banjade, R., Niraula, N. B., and Stefanescu, D. 2013. SEMILAR: The semantic similarity toolkit. In ACL (Conference System Demonstrations), pp. 163–168.
Rush, A. M., Chopra, S., and Weston, J., 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics, pp. 379389.
See, A., Liu, P. J., and Manning, C. D., 2017. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada: Association for Computational Linguistics, pp. 10731083.
Song, K., Zhao, L., and Liu, F. 2018. Structure-infused copy mechanisms for abstractive summarization. In Proceedings of the International Conference on Computational Linguistics (COLING), Santa Fe, New Mexico, USA.
Suzuki, J., and Nagata, M. 2017. Cutting-off redundant repeating generations for neural abstractive summarization. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL).
Takase, S., Suzuki, J., Okazaki, N., Hirao, T., and Nagata, M., 2016. Neural headline generation on abstract meaning representation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas: Association for Computational Linguistics, pp. 10541059.
Tan, J., Wan, X., and Xiao, J. 2017. Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).
Tarnpradab, S., Liu, F., and Hua, K. A., 2017. Toward extractive summarization of online forum discussions via hierarchical attention networks. In Proceedings of the 30th Florida Artificial Intelligence Research Society Conference (FLAIRS), Marco Island, Florida, pp. 288292.
Teufel, S., and Moens, M., 2002. Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics 28 (4): 409445.
Van den Boom, G., Paas, F., Van Merrienboer, J. J., and Van Gog, T., 2004. Reflection prompts and tutor feedback in a web-based learning environment: Effects on students’ self-regulated learning competence. Computers in Human Behavior 20 (4): 551567.
Vanderwende, L., Suzuki, H., Brockett, C., and Nenkova, A., 2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing & Management 43 (6): 16061618.
Wang, D., Li, T., Zhu, S., and Ding, C. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 307–314.
Wang, W. Y., Mehdad, Y., Radev, D. R., and Stent, A. 2016a. A low-rank approximation approach to learning joint embeddings of news stories and images for timeline summarization. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 58–68.
Wang, X., Nishino, M., Hirao, T., Sudoh, K., and Nagata, M., 2016b. Exploring text links for coherent multi-document summarization. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING-2016), Osaka, Japan. The COLING 2016 Organizing Committee, pp. 213223.
Wen, M., Yang, D., and Rose, C. P. 2014a. Linguistic reflections of student engagement in massive open online courses. In Proceedings of International Conference on Web and Social Media (ICWSM).
Wen, M., Yang, D., and Rosé, C. P. 2014b. Sentiment analysis in MOOC discussion forums: What does it tell us. In Proceedings of the 7th International Conference on Educational Data Mining (EDM).
Wilson, R. C., 1986. Improving faculty teaching: Effective use of student evaluations and consultants. Journal of Higher Education 57 (2): 196211.
Wolpert, D. H., 1996. The lack of a priori distinctions between learning algorithms. Neural computation 8 (7): 13411390.
Xiong, W. 2015. Helpfulness Guided Review Summarization. PhD thesis, University of Pittsburgh.
Xiong, W., and Litman, D., 2014. Empirical analysis of exploiting review helpfulness for extractive summarization of online reviews. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING-2014), Dublin, Ireland: Dublin City University and Association for Computational Linguistics, pp. 19851995.
Yasunaga, M., Zhang, R., Meelu, K., Pareek, A., Srinivasan, K., and Radev, D. 2017. Graph-based neural multi-document summarization. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL), Vancouver, Canada.
Zhou, Q., Yang, N., Wei, F., and Zhou, M. 2017. Selective encoding for abstractive sentence summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL).

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed