Hostname: page-component-7479d7b7d-68ccn Total loading time: 0 Render date: 2024-07-08T23:12:13.787Z Has data issue: false hasContentIssue false

CCG supertagging with bidirectional long short-term memory networks*

Published online by Cambridge University Press:  04 September 2017

REKIA KADARI
Affiliation:
Research Center for Social Computing and Information Retrieval.Harbin Institute of technology, China email: rekia@ir.hit.edu.cn, zhangyu@ir.hit.edu.cn, wnzhang@ir.hit.edu.cn, tliu@ir.hit.edu.cn
YU ZHANG
Affiliation:
Research Center for Social Computing and Information Retrieval.Harbin Institute of technology, China email: rekia@ir.hit.edu.cn, zhangyu@ir.hit.edu.cn, wnzhang@ir.hit.edu.cn, tliu@ir.hit.edu.cn
WEINAN ZHANG
Affiliation:
Research Center for Social Computing and Information Retrieval.Harbin Institute of technology, China email: rekia@ir.hit.edu.cn, zhangyu@ir.hit.edu.cn, wnzhang@ir.hit.edu.cn, tliu@ir.hit.edu.cn
TING LIU
Affiliation:
Research Center for Social Computing and Information Retrieval.Harbin Institute of technology, China email: rekia@ir.hit.edu.cn, zhangyu@ir.hit.edu.cn, wnzhang@ir.hit.edu.cn, tliu@ir.hit.edu.cn

Abstract

Neural Network-based approaches have recently produced good performances in Natural language tasks, such as Supertagging. In the supertagging task, a Supertag (Lexical category) is assigned to each word in an input sequence. Combinatory Categorial Grammar Supertagging is a more challenging problem than various sequence-tagging problems, such as part-of-speech (POS) tagging and named entity recognition due to the large number of the lexical categories. Specifically, simple Recurrent Neural Network (RNN) has shown to significantly outperform the previous state-of-the-art feed-forward neural networks. On the other hand, it is well known that Recurrent Networks fail to learn long dependencies. In this paper, we introduce a new neural network architecture based on backward and Bidirectional Long Short-Term Memory (BLSTM) Networks that has the ability to memorize information for long dependencies and benefit from both past and future information. State-of-the-art methods focus on previous information, whereas BLSTM has access to information in both previous and future directions. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short-Term Memory (LSTM) networks are more precise and successful than both unidirectional and bidirectional standard RNNs. Experiment results reveal the effectiveness of our proposed method on both in-domain and out-of-domain datasets. Experiments show improvements about (1.2 per cent) over standard RNN.

Type
Articles
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

We thank the anonymous reviewers for their valuable comments. This work was supported by the Natural Science Foundation of China (Grant No. 61472105, 61472107) and the High Technology Research and Development Program of China (Grant No. 2015AA015407).

References

Bangalore, S., and Joshi, A. K., 1999. Supertagging: an approach to almost parsing. Computational Linguistics 25 (2): 237–65.Google Scholar
Charniak, E., Carroll, G., Adcock, J., Cassandra, A., Gotoh, Y., Katz, J., Littman, M., and McCann, J., 1996. Taggers for parsers. Artificial Intelligence 85 (1): 4557.CrossRefGoogle Scholar
Chiu, J. P. C., and Nichols, E. 2016. Named entity recognition with bidirectional lstm-cnns. Transactions of the Association for Computational Linguistics, 4: 357–70.Google Scholar
Cho, K., Merrienboer, B. V., Bahdanau, D., and Bengio, Y., 2014. On the properties of neural machine translation: encoder-decoder approaches. In Proceedings of 8th Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), Doha, Qatar, pp. 103–11.Google Scholar
Clark, S., and Curran, R. J., 2004. The importance of supertagging for wide-coverage CCG parsing. In Proceedings of the 20th Conference on Computational Linguistics COLING-04, Geneva, Switzerland, pp. 282–88.Google Scholar
Clark, S., and Curran, R. J., 2007. Wide coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics 33 (4): 493552.Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P., 2011. Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12 : 2493–537.Google Scholar
Gers, F. 2001. Long Short-Term Memory in Recurrent Neural Networks. PhD Thesis, Ecole Polytechnique Federale de Lausanne, Lausanne, Switzerland.Google Scholar
Graves, A., Mohamed, A-R., and Hinton, G., 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference Acoustics, Speech and Signal Processing (ICASSP), Vancouver, British Columbia, Canada, pp. 6645–649.Google Scholar
Hochreiter, S., and Schmidhuber, J., 1997. Long short-term memory. Neural computation 9 (8): 1735–780.Google Scholar
Hockenmaier, J., and Steedman, M., 2007. CCGbank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguistics 33 (3): 355–96.Google Scholar
Honnibal, M., Nothman, J., and Curran, R. J., 2009. Evaluating a statistical CCG parser on wikipedia. In Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, Association for Computational Linguistics, Suntec, Singapore, pp. 3841.Google Scholar
Lewis, M., Lee, K., and Zettlemoyer, L., 2016. LSTM CCG Parsing. In Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies, Association for Computational Linguistics, San Diego, California, pp. 221–31.Google Scholar
Lewis, M., and Steedman, M., 2014. Improved CCG parsing with semi-supervised supertagging. Transactions of the Association for Computational Linguistics 2 : 327–38.Google Scholar
Ling, W., Dyer, C., Black, A., and Trancoso, I., 2015. Two/too simple adaptations of Word2Vec for syntax problems. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL), Denver, Colorado, USA, pp. 1299–304.Google Scholar
Marcus, M. P., Santorini, B., and Marcinkiewicz, A. M., 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19 (2): 313–30.Google Scholar
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Cernocky, J. H., 2011. Rnnlm-recurrent neural network language modeling toolkit. In Proceedings of the 2011 Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE Signal Processing Society, Waikoloa, HI, USA, pp. 196201.CrossRefGoogle Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J., 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, Lake Tahoe, Stateline, Nevada, pp. 3111–119.Google Scholar
Pham, V., Bluche, T., Kermorvant, C., and Louradour, J., 2014. Dropout improves recurrent neural networks for handwriting recognition. In Proceedings of the 14th International Conferenace Frontiers in Handwriting Recognition (ICFHR), IEEE, Crete, Greece, pp. 285–90.Google Scholar
Pyysalo, S., Ginter, F., Heimonen, J., Bjorne, J., Boberg, J., Jouni, J., and Salakoski, T., 2007. Bioinfer: a corpus for information extraction in the biomedical domain. BioMed Central Bioinformatics 8 (1): 50.Google Scholar
Rimell, L., and Clark, S., 2008. Adapting a lexicalized-grammar parser to contrasting domains. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-08), Association for Computational Linguistics, Honolulu, Hawai’i, USA, pp. 475–84.Google Scholar
Steedman, M. 2000. The Syntactic Process. Cambridge, MA: The MIT Press.Google Scholar
Steedman, M., and Baldridge, J. 2011. Combinatory categorial grammar. In Borsley, R. D. and Börjars, K. (eds.), Non-Transformational Syntax: Formal and Explicit Models of Grammar. Oxford: Wiley-Blackwell, pp. 181224.Google Scholar
Turian, J., Ratinov, L., and Bengio, Y., 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 384–94.Google Scholar
Vaswani, A., Bisk, Y., Sagae, K., and Musa, R. 2016. Supertagging With LSTMs. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), San Diego, California, pp. 232–37.Google Scholar
Wang, D., and Nyberg, E., 2015. A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, pp. 707–12.Google Scholar
Wang, P., Qian, Y., Soong, F. K., He, L., and Zhao, H. 2015. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv preprint arXiv: 1510.0618.Google Scholar
Xu, W., Auli, M., and Clark, S. 2015. CCG supertagging with a recurrent neural network. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL), Beijing, China, pp. 250–55.Google Scholar
Zeiler, M. D. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv: 1212.5701.Google Scholar