Linguistic knowledge-based vocabularies for Neural Machine Translation

Noe Casas; Marta R. Costa-jussà; José A. R. Fonollosa; Juan A. Alonso; Ramón Fanlo

doi:10.1017/S1351324920000364

Linguistic knowledge-based vocabularies for Neural Machine Translation

Published online by Cambridge University Press: 02 July 2020

Noe Casas

Marta R. Costa-jussà ,

José A. R. Fonollosa ,

Juan A. Alonso and

Ramón Fanlo

Show author details

Noe Casas*: Affiliation:
TALP Research Center, Universitat Politècnica de Catalunya
Marta R. Costa-jussà: Affiliation:
TALP Research Center, Universitat Politècnica de Catalunya
José A. R. Fonollosa: Affiliation:
TALP Research Center, Universitat Politècnica de Catalunya
Juan A. Alonso: Affiliation:
Lucy Software, United Language Group
Ramón Fanlo: Affiliation:
Lucy Software, United Language Group
*: *Corresponding author. E-mail: contact@noecasas.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Neural Networks applied to Machine Translation need a finite vocabulary to express textual information as a sequence of discrete tokens. The currently dominant subword vocabularies exploit statistically-discovered common parts of words to achieve the flexibility of character-based vocabularies without delegating the whole learning of word formation to the neural network. However, they trade this for the inability to apply word-level token associations, which limits their use in semantically-rich areas and prevents some transfer learning approaches e.g. cross-lingual pretrained embeddings, and reduces their interpretability. In this work, we propose new hybrid linguistically-grounded vocabulary definition strategies that keep both the advantages of subword vocabularies and the word-level associations, enabling neural networks to profit from the derived benefits. We test the proposed approaches in both morphologically rich and poor languages, showing that, for the former, the quality in the translation of out-of-domain texts is improved with respect to a strong subword baseline.

Keywords

machine translation neural network morphology vocabulary

Type: Article
Information: Natural Language Engineering , Volume 27 , Issue 4 , July 2021 , pp. 485 - 506

DOI: https://doi.org/10.1017/S1351324920000364 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alexandrescu, A. and Kirchhoff, K. (2006). Factored neural language models. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, NAACL-Short’06, pp. 1–4.CrossRef Google Scholar

Alonso, J.A. and Thurmair, G. (2003). The comprendium translator system. In Proceedings of the Ninth Machine Translation Summit.Google Scholar

Avramidis, E. and Koehn, P. (2008). Enriching morphologically poor languages for statistical machine translation. In Proceedings of ACL-08: HLT, Columbus, Ohio. Association for Computational Linguistics, pp. 763–770.Google Scholar

Bahdanau, D., Cho, K. and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings.Google Scholar

Banerjee, S. and Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan. Association for Computational Linguistics, pp. 65–72.Google Scholar

Bengio, Y., Simard, P. and Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks 50 (2), 157–166.CrossRef Google Scholar

Callison-Burch, C., Osborne, M. and Koehn, P. (2006). Re-evaluating the role of Bleu in machine translation research. In 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy. Association for Computational Linguistics.Google Scholar

Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. and Bengio, Y. (2014). Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1724–1734. 10.3115/v1/D14-1179.CrossRef Google Scholar

Conforti, C., Huck, M. and Fraser, A. (2018). Neural morphological tagging of lemma sequences for machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers), Boston, MA. Association for Machine Translation in the Americas, pp. 39–53.Google Scholar

Costa-jussà, M.R., Escolano, C. and Fonollosa, J.A.R. (2017). Byte-based neural machine translation. In Proceedings of the First Workshop on Subword and Character Level Models in NLP. Association for Computational Linguistics, pp. 154–158.CrossRef Google Scholar

Creutz, M. and Lagus, K. (2002). Unsupervised discovery of morphemes. In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning. Association for Computational Linguistics, pp. 21–30. 10.3115/1118647.1118650.CrossRef Google Scholar

de Marneffe, M.-C., Dozat, T., Silveira, N., Haverinen, K., Ginter, F., Nivre, J. and Manning, C.D. (2014). Universal Stanford dependencies: A cross-linguistic typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland. European Languages Resources Association (ELRA), pp. 4585–4592.Google Scholar

Etchegoyhen, T., Azpeitia, A. and Pérez, N. (2016). Exploiting a large strongly comparable corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia. European Language Resources Association (ELRA), pp. 3523–3529.Google Scholar

Faruqui, M., Schuetze, H., Trancoso, I. and Yaghoobzadeh, Y. (2017). Proceedings of the First Workshop on Subword and Character Level Models in NLP. Association for Computational Linguistics.Google Scholar

Faruqui, M., Schuetze, H., Trancoso, I., Tsvetkov, Y. and Yaghoobzadeh, Y. (2018). Proceedings of the Second Workshop on Subword and Character Level Models in NLP (SCLeM 2018). Association for Computational Linguistics.Google Scholar

Garcıa-Martınez, M., Barrault, L. and Bougares, F. (2016). Factored neural machine translation architectures. In Proceedings of the International Workshop on Spoken Language Translation. Seattle, USA, IWSLT, vol. 16.Google Scholar

Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität München 910 (1).Google Scholar

Honnibal, M. and Montani, I. (to appear) spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing.Google Scholar

Jan, N., Cattoni, R., Sebastian, S., Cettolo, M., Turchi, M. and Federico, M. (2018). The iwslt 2018 evaluation campaign. In International Workshop on Spoken Language Translation, pp. 2–6.Google Scholar

Klein, G., Kim, Y., Deng, Y., Senellart, J. and Rush, A. (2017). OpenNMT: Open-source toolkit for neural machine translation. In Proceedings of ACL 2017, System Demonstrations, Vancouver, Canada. Association for Computational Linguistics, pp. 67–72.CrossRef Google Scholar

Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. Association for Computational Linguistics, pp. 388–395.Google Scholar

Lamiroy, B. and Gebruers, R. (1989). Syntax and machine translation: The metal project. Lingvisticae Investigationes 130 (2), 307–332.CrossRef Google Scholar

Luong, M.-T. and Manning, C.D. (2016). Achieving open vocabulary neural machine translation with hybrid word-character models. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 1054–1063. 10.18653/v1/P16-1100.CrossRef Google Scholar

Luong, T., Pham, H. and Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 1412–1421. 10.18653/v1/D15-1166.CrossRef Google Scholar

Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, Maryland. Association for Computational Linguistics, pp. 55–60. 10.3115/v1/P14-5010.CrossRef Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013) Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26. Curran Associates, Inc., pp. 3111–3119.Google Scholar

Mikolov, T., Sutskever, I., Deoras, A., Le, H.-S., Kombrink, S. and Cernocky, J. (2012). Subword language modeling with neural networks. Technical report, Faculty of Information Technology, Brno University of Technology.Google Scholar

Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, pp. 311–318. 10.3115/1073083.1073135.Google Scholar

Passban, P. (2017). Machine Translation of Morphologically Rich Languages Using Deep Neural Networks. PhD Thesis, Dublin City University.Google Scholar

Ponti, E.M., Reichart, R., Korhonen, A. and VuliĆ I. (2018). Isomorphic transfer of syntactic structures in cross-lingual NLP. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia. Association for Computational Linguistics.Google Scholar

Post, M. (2018). A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium. Association for Computational Linguistics, pp. 186–191. 10.18653/v1/W18-6319.CrossRef Google Scholar

Riezler, S. and Maxwell, J.T. (2005). On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan. Association for Computational Linguistics, pp. 57–64.Google Scholar

Schütze, H. (1993). Word space. In Advances in Neural Information Processing Systems, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp. 895–902. ISBN 1-55860-274-7.Google Scholar

Sennrich, R. and Haddow, B. (2016). Linguistic input features improve neural machine translation. In Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers. Association for Computational Linguistics, pp. 83–91. 10.18653/v1/W16-2209.CrossRef Google Scholar

Sennrich, R., Haddow, B. and Birch, A. (2016). Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany. Association for Computational Linguistics, pp. 1715–1725. 10.18653/v1/P16-1162.CrossRef Google Scholar

Sennrich, R., Schneider, G., Volk, M. and Warin, M. (2009). A new hybrid dependency parser for German. Proceedings of the German Society for Computational Linguistics and Language Technology 115, 124.Google Scholar

Sennrich, R., Volk, M. and Schneider, G. (2013). Exploiting synergies between open resources for German dependency parsing, POS-tagging, and morphological analysis. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pp. 601–609.Google Scholar

Shaik, M.A.B., Mousa, A.E.-D., Schlüter, R. and Ney, H. (2011). Hybrid language models using mixed types of sub-lexical units for open vocabulary German LVCSR. In Interspeech, Florence, Italy, pp. 1441–1444.Google Scholar

Song, K., Zhang, Y., Zhang, M. and Luo, W. (2018). Improved English to Russian translation by neural suffix prediction. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar

Sutskever, I., Martens, J. and Hinton, G. (2011). Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, Madison, WI, USA. Omnipress, pp. 1017–1024. ISBN 9781450306195.Google Scholar

Ueffing, N. and Ney, H. (2003). Using POS information for statistical machine translation into. Pure and Applications Algebra 34, 119–145.Google Scholar

Vania, C. and Lopez, A. (2017). From characters to words to in between: Do we capture morphology? In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pp. 2016–2027. 10.18653/v1/P17-1184.Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u. and Polosukhin, I. (2017) Attention is all you need. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., pp. 5998–6008.Google Scholar

Virpioja, S., Smit, P., Grönroos, S.-A. and Kurimo, M. (2013). Morfessor 2.0: Python implementation and extensions for Morfessor baseline. Technical report.Google Scholar

Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint .Google Scholar

Article contents

Linguistic knowledge-based vocabularies for Neural Machine Translation

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests