CCG supertagging with bidirectional long short-term memory networks*

REKIA KADARI; YU ZHANG; WEINAN ZHANG; TING LIU

doi:10.1017/S1351324917000250

CCG supertagging with bidirectional long short-term memory networks*

Published online by Cambridge University Press: 04 September 2017

REKIA KADARI ,

YU ZHANG ,

WEINAN ZHANG and

TING LIU

Show author details

REKIA KADARI: Affiliation:
Research Center for Social Computing and Information Retrieval.Harbin Institute of technology, China email: rekia@ir.hit.edu.cn, zhangyu@ir.hit.edu.cn, wnzhang@ir.hit.edu.cn, tliu@ir.hit.edu.cn
YU ZHANG: Affiliation:
Research Center for Social Computing and Information Retrieval.Harbin Institute of technology, China email: rekia@ir.hit.edu.cn, zhangyu@ir.hit.edu.cn, wnzhang@ir.hit.edu.cn, tliu@ir.hit.edu.cn
WEINAN ZHANG: Affiliation:
Research Center for Social Computing and Information Retrieval.Harbin Institute of technology, China email: rekia@ir.hit.edu.cn, zhangyu@ir.hit.edu.cn, wnzhang@ir.hit.edu.cn, tliu@ir.hit.edu.cn
TING LIU: Affiliation:
Research Center for Social Computing and Information Retrieval.Harbin Institute of technology, China email: rekia@ir.hit.edu.cn, zhangyu@ir.hit.edu.cn, wnzhang@ir.hit.edu.cn, tliu@ir.hit.edu.cn

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Neural Network-based approaches have recently produced good performances in Natural language tasks, such as Supertagging. In the supertagging task, a Supertag (Lexical category) is assigned to each word in an input sequence. Combinatory Categorial Grammar Supertagging is a more challenging problem than various sequence-tagging problems, such as part-of-speech (POS) tagging and named entity recognition due to the large number of the lexical categories. Specifically, simple Recurrent Neural Network (RNN) has shown to significantly outperform the previous state-of-the-art feed-forward neural networks. On the other hand, it is well known that Recurrent Networks fail to learn long dependencies. In this paper, we introduce a new neural network architecture based on backward and Bidirectional Long Short-Term Memory (BLSTM) Networks that has the ability to memorize information for long dependencies and benefit from both past and future information. State-of-the-art methods focus on previous information, whereas BLSTM has access to information in both previous and future directions. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short-Term Memory (LSTM) networks are more precise and successful than both unidirectional and bidirectional standard RNNs. Experiment results reveal the effectiveness of our proposed method on both in-domain and out-of-domain datasets. Experiments show improvements about (1.2 per cent) over standard RNN.

Type: Articles
Information: Natural Language Engineering , Volume 24 , Issue 1 , January 2018 , pp. 77 - 90

DOI: https://doi.org/10.1017/S1351324917000250 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

We thank the anonymous reviewers for their valuable comments. This work was supported by the Natural Science Foundation of China (Grant No. 61472105, 61472107) and the High Technology Research and Development Program of China (Grant No. 2015AA015407).

References

Bangalore, S., and Joshi, A. K., 1999. Supertagging: an approach to almost parsing. Computational Linguistics 25 (2): 237–65.Google Scholar

Charniak, E., Carroll, G., Adcock, J., Cassandra, A., Gotoh, Y., Katz, J., Littman, M., and McCann, J., 1996. Taggers for parsers. Artificial Intelligence 85 (1): 45–57.CrossRef Google Scholar

Chiu, J. P. C., and Nichols, E. 2016. Named entity recognition with bidirectional lstm-cnns. Transactions of the Association for Computational Linguistics, 4: 357–70.Google Scholar

Cho, K., Merrienboer, B. V., Bahdanau, D., and Bengio, Y., 2014. On the properties of neural machine translation: encoder-decoder approaches. In Proceedings of 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), Doha, Qatar, pp. 103–11.Google Scholar

Chollet, F. 2015. Keras. https://github.com/fchollet/keras.Google Scholar

Clark, S., and Curran, R. J., 2004. The importance of supertagging for wide-coverage CCG parsing. In Proceedings of the 20th Conference on Computational Linguistics COLING-04, Geneva, Switzerland, pp. 282–88.Google Scholar

Clark, S., and Curran, R. J., 2007. Wide coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics 33 (4): 493–552.Google Scholar

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P., 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12 : 2493–537.Google Scholar

Gers, F. 2001. Long Short-Term Memory in Recurrent Neural Networks. PhD Thesis, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland.Google Scholar

Graves, A., Mohamed, A-R., and Hinton, G., 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP), Vancouver, British Columbia, Canada, pp. 6645–649.Google Scholar

Hochreiter, S., and Schmidhuber, J., 1997. Long short-term memory. Neural computation 9 (8): 1735–780.Google Scholar

Hockenmaier, J., and Steedman, M., 2007. CCGbank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguistics 33 (3): 355–96.Google Scholar

Honnibal, M., Nothman, J., and Curran, R. J., 2009. Evaluating a statistical CCG parser on wikipedia. In Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, Association for Computational Linguistics, Suntec, Singapore, pp. 38–41.Google Scholar

Lewis, M., Lee, K., and Zettlemoyer, L., 2016. LSTM CCG Parsing. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies, Association for Computational Linguistics, San Diego, California, pp. 221–31.Google Scholar

Lewis, M., and Steedman, M., 2014. Improved CCG parsing with semi-supervised supertagging. Transactions of the Association for Computational Linguistics 2 : 327–38.Google Scholar

Ling, W., Dyer, C., Black, A., and Trancoso, I., 2015. Two/too simple adaptations of Word2Vec for syntax problems. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), Denver, Colorado, USA, pp. 1299–304.Google Scholar

Marcus, M. P., Santorini, B., and Marcinkiewicz, A. M., 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19 (2): 313–30.Google Scholar

Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Cernocky, J. H., 2011. Rnnlm-recurrent neural network language modeling toolkit. In Proceedings of the 2011 Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE Signal Processing Society, Waikoloa, HI, USA, pp. 196–201.CrossRef Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J., 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, Lake Tahoe, Stateline, Nevada, pp. 3111–119.Google Scholar

Pham, V., Bluche, T., Kermorvant, C., and Louradour, J., 2014. Dropout improves recurrent neural networks for handwriting recognition. In Proceedings of the 14th International Conferenace Frontiers in Handwriting Recognition (ICFHR), IEEE, Crete, Greece, pp. 285–90.Google Scholar

Pyysalo, S., Ginter, F., Heimonen, J., Bjorne, J., Boberg, J., Jouni, J., and Salakoski, T., 2007. Bioinfer: a corpus for information extraction in the biomedical domain. BioMed Central Bioinformatics 8 (1): 50.Google Scholar

Rimell, L., and Clark, S., 2008. Adapting a lexicalized-grammar parser to contrasting domains. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-08), Association for Computational Linguistics, Honolulu, Hawai’i, USA, pp. 475–84.Google Scholar

Steedman, M. 2000. The Syntactic Process. Cambridge, MA: The MIT Press.Google Scholar

Steedman, M., and Baldridge, J. 2011. Combinatory categorial grammar. In Borsley, R. D. and Börjars, K. (eds.), Non-Transformational Syntax: Formal and Explicit Models of Grammar. Oxford: Wiley-Blackwell, pp. 181–224.Google Scholar

Turian, J., Ratinov, L., and Bengio, Y., 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 384–94.Google Scholar

Vaswani, A., Bisk, Y., Sagae, K., and Musa, R. 2016. Supertagging With LSTMs. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), San Diego, California, pp. 232–37.Google Scholar

Wang, D., and Nyberg, E., 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 707–12.Google Scholar

Wang, P., Qian, Y., Soong, F. K., He, L., and Zhao, H. 2015. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv preprint arXiv: 1510.0618.Google Scholar

Xu, W., Auli, M., and Clark, S. 2015. CCG supertagging with a recurrent neural network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL), Beijing, China, pp. 250–55.Google Scholar

Zeiler, M. D. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv: 1212.5701.Google Scholar

Article contents

CCG supertagging with bidirectional long short-term memory networks*

Abstract

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests