Statistical Translation After Source Reordering: Oracles, Context-Aware Models, and Empirical Analysis

MAXIM KHALILOV; KHALIL SIMA'AN

doi:10.1017/S1351324912000162

Statistical Translation After Source Reordering: Oracles, Context-Aware Models, and Empirical Analysis

Published online by Cambridge University Press: 14 May 2012

MAXIM KHALILOV and

KHALIL SIMA'AN

Show author details

MAXIM KHALILOV: Affiliation:
Institute for Logic, Language and Computation, University of AmsterdamP.O. Box 94242, 1090 GE Amsterdam, The Netherlands e-mails: maxim@tauslabs.com, k.simaan@uva.nl
KHALIL SIMA'AN: Affiliation:
Institute for Logic, Language and Computation, University of AmsterdamP.O. Box 94242, 1090 GE Amsterdam, The Netherlands e-mails: maxim@tauslabs.com, k.simaan@uva.nl

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In source reordering the order of the source words is permuted to minimize word order differences with the target sentence and then fed to a translation model. Earlier work highlights the benefits of resolving long-distance reorderings as a pre-processing step to standard phrase-based models. However, the potential performance improvement of source reordering and its impact on the components of the subsequent translation model remain unexplored. In this paper we study both aspects of source reordering. We set up idealized source reordering (oracle) models with/without syntax and present our own syntax-driven model of source reordering. The latter is a statistical model of inversion transduction grammar (ITG)-like tree transductions manipulating a syntactic parse and working with novel conditional reordering parameters. Having set up the models, we report translation experiments showing significant improvement on three language pairs, and contribute an extensive analysis of the impact of source reordering (both oracle and model) on the translation model regarding the quality of its input, phrase-table, and output. Our experiments show that oracle source reordering has untapped potential in improving translation system output. Besides solving difficult reorderings, we find that source reordering creates more monotone parallel training data at the back-end, leading to significantly larger phrase tables with higher coverage of phrase types in unseen data. Unfortunately, this nice property does not carry over to tree-constrained source reordering. Our analysis shows that, from the string-level perspective, tree-constrained reordering might selectively permute word order, leading to larger phrase tables but without increase in phrase coverage in unseen data.

Type: Articles
Information: Natural Language Engineering , Volume 18 , Issue 4 , October 2012 , pp. 491 - 519

DOI: https://doi.org/10.1017/S1351324912000162 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Birch, A., and Osborne, M. 2010. LRscore for evaluating lexical and reordering quality in MT. In Proceedings of the Joint Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, July 15–16, pp. 327–32. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Brown, P., Della Pietra, V., Della Pietra, S., and Mercer, R. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19 (2): 263–311.Google Scholar

Chiang, D. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, MI, USA, pp. 263–70. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Chiang, D. 2007. Hierarchical phrase-based translation. Computational Linguistics 2 (33): 201–28.CrossRef Google Scholar

Collins, M., Koehn, P., and Kučerová, I. 2005. Clause restructuring for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, MI, USA, pp. 531–40. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Costa-jussà, M. R., and Fonollosa, J. A. R. 2006. Statistical machine reordering. In Proceedings of the Joint Conference on Human Language Technology and the Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), New York, NY, USA, pp. 70–6.Google Scholar

DeNeefe, S., Knight, K., Wang, W., and Marcu, D. 2007. What can syntax-based MT learn from phrase-based MT? In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 755–63.Google Scholar

DeNero, J., and Uszkoreit, J. 2011. Inducing sentence structure from parallel corpora for reordering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 193–203. Edinburgh, Scotland, UK: Association for Computational Linguistics.Google Scholar

Doddington, G. 2002. Automatic evaluation of machine translation quality using n-grams co-occurrence statistics. In Proceedings of the Conference on Human Language Technology (HLT), San Diego, CA, USA, pp. 128–32.Google Scholar

Dyer, C., Clark, J. H., Lavie, A., and Smith, N. A. 2011. Unsupervised word alignment with arbitrary features. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Portland, OR, pp. 409–19.Google Scholar

Dyer, C., and Resnik, P. 2010. Context-free reordering, finite-state translation. In Proceedings of Human Language Technology and North American Chapter of the ACL (HLT-NAACL), Los Angeles, CA, USA, pp. 858–66.Google Scholar

Galley, M., Hopkins, M., Knight, K., and Marcu, D. 2004. What's in a translation rule? In Proceedings of Human Language Technology and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), Boston, MA, USA, pp. 273–80.Google Scholar

Genzel, D. 2010. Automatically learning source-side reordering rules for large-scale machine translation. In Proceedings of the International Conference on Computational Linguistics (COLING), Beijing, China, pp. 376–84.Google Scholar

Huang, L., Zhang, H., Gildea, D., and Knight, K. 2009. Binarization of synchronous context-free grammars. Computational Linguistics 35 (4): 559–95.CrossRef Google Scholar

Isozaki, H., Sudoh, K., Tsukada, H., and Duh, K. 2010. Head finalization: a simple reordering rule for Sov languages. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, pp. 244–51. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Katz-Brown, J., Petrov, S., McDonald, R. T., Och, F. J., Talbot, D., Ichikawa, H., Seno, M., and Kazawa, H. 2011. Training a parser for machine translation reordering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), a Meeting of SIGDAT, a Special Interest Group of the ACL, Edinburgh, Scotland, UK, pp. 183–92.Google Scholar

Khalilov, M. 2009. New statistical and Syntactic Models for Machine Translation. Ph.D. thesis, Universitat Politècnica de Catalunya, Barcelona, Spain.Google Scholar

Khalilov, M., and Sima'an, K. 2010. A discriminative syntactic model for source permutation via tree transduction. In Proceedings of the Fourth Workshop on Syntax and Structure in Statistical Translation (SSST-4) at the International Conference on Computational Linguistics (COLING), Beijing, China, pp. 92–100.Google Scholar

Khalilov, M., and Sima'an, K. 2011. Context-sensitive syntactic source-reordering by statistical transduction. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand, pp. 38–46.Google Scholar

Klein, D., and Manning, C. 2003. Accurate unlexicalized parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Sapporo, Japan, pp. 423–30.Google Scholar

Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), Barcelona, Spain, pp. 388–95.Google Scholar

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: open-source toolkit for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic, pp. 177–80.Google Scholar

Koehn, P., Och, F., and Marcu, D. 2003. Statistical phrase-based machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL), Edmonton, Canada, pp. 48–54.Google Scholar

Li, L. 1998. A comparison of word order in English and Chinese. Poznań Studies in Contemporary Linguistics (Formerly: Papers and Studies in Contrastive Linguistics) 34: 153–61.Google Scholar

Li, C., Minghui, L., Zhang, D., Li, M., Zhou, M., and Guan, Y. 2007. A probabilistic approach to syntax-based reordering for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic, pp. 720–7.Google Scholar

Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. 1993. Building a large annotated corpus of English: the Penn treebank. Computational Linguistics 19 (2): 313–30.Google Scholar

Mylonakis, M., and Sima'an, K. 2011. Learning hierarchical translation structure with linguistic annotations. Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), Portland, OR, USA.Google Scholar

Och, F. 1999. An efficient method for determining bilingual word classes. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Maryland, USA, pp. 71–6.Google Scholar

Och, F. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Sapporo, Japan, pp. 160–7.Google Scholar

Och, F., and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, USA, pp. 295–302.Google Scholar

Och, F., and Ney, H. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29 (1): 19–51.CrossRef Google Scholar

Och, F., and Ney, H. 2004. The alignment template approach to statistical machine translation. Computational Linguistics 30 (4): 417–49.CrossRef Google Scholar

Papineni, K., Roukos, S., Ward, T., and Zhu, W. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, USA, pp. 311–18.Google Scholar

Popovic', M., and Ney, H. 2006. POS-based word reorderings for statistical machine translation. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 1278–83.Google Scholar

PVS, A. 2010. A data mining approach to learn reorder rules for SMT. In Proceedings of Human Language Technologies: the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL/HLT), Los Angeles, CA, USA, pp. 52–7.Google Scholar

Ramanathan, A., Bhattacharyya, P., Hegde, J., Shah, R., and Sasikumar, M. 2008. Simple syntactic and morphological processing can help English–Hindi statistical machine translation. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP), Hyderabad, India, pp. 513–20.Google Scholar

Stolcke, A. 2002. SRILM: an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, USA, pp. 901–4.Google Scholar

Tillman, C. 2004. A unigram orientation model for statistical machine translation. In Proceedings of Human Language Technologies: the Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), Boston, MA, USA, pp. 101–4.Google Scholar

Tromble, R., and Eisner, J. 2009. Learning linear ordering problems for better translation. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), Singapore, pp. 1007–16.Google Scholar

Visweswariah, K., Navratil, J., Sorensen, J., Chenthamarakshan, V., and Kambhatla, N. 2010. Syntax-based reordering with automatically derived rules for improved statistical machine translation. In Proceeding of the International Conference on Computational Linguistics (COLING), Beijing, China, pp. 1119–27.Google Scholar

Visweswariah, K., Rajkumar, R., Gandhe, A., Ramanathan, A., and Navratil, J. 2011. A word reordering model for improved machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, Scotland, UK, pp. 486–96.Google Scholar

Wang, C., Collins, M., and Koehn, P. 2007. Chinese syntactic reordering for statistical machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing (EMNLP) and Conference on Computational Natural Language Learning (CoNLL), Prague, Czech Republic, pp. 737–45.Google Scholar

Wang, W., May, J., Knight, K., and Marcu, D. June 2010. Re-structuring, re-labeling, and re-aligning for syntax-based machine translation. Computational Linguistics 36: 247–77.CrossRef Google Scholar

Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 3 (23): 377–403.Google Scholar

Wu, D., and Wong, H. 1998. Machine translation with a stochastic grammatical channel. In Proceedings of the Joint Conference of the Annual Meeting of the Association for Computational Linguistics (ACL) and the International Conference on Computational Linguistics (COLING), Columbus, OH, USA, pp. 1408–15.Google Scholar

Xia, F., and McCord, M. 2004. Improving a statistical MT system with automatically learned rewrite patterns. In Proceedings of the International Conference on Computational Linguistics (COLING), Geneva, Switzerland, pp. 508–14.Google Scholar

Zens, R., and Ney, H. 2003.A comparative study on reordering constraints in statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Sapporo, Japan, pp. 144–51.Google Scholar

Zens, R., Och, F., and Ney, H. 2002. Phrase-based statistical machine translation. In Proceedings of KI: advances in Artificial Intelligence, pp. 18–32.Google Scholar

Zollmann, A., and Venugopal, A. 2006. Syntax-augmented machine translation via chart parsing. In Proceedings of the North American Association for Computational Linguistics Conference (NAACL), pp. 138–41.Google Scholar

Zwarts, S., and Dras, M. 2007. Syntax-based word reordering in phrase-based statistical machine translation: why does it work? Proceedings of the MT Summit XI, Copenhagen, Denmark.Google Scholar

Article contents

Statistical Translation After Source Reordering: Oracles, Context-Aware Models, and Empirical Analysis

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests