End-to-end statistical machine translation with zero or small parallel texts†

ANN IRVINE; CHRIS CALLISON-BURCH

doi:10.1017/S1351324916000127

End-to-end statistical machine translation with zero or small parallel texts†

Published online by Cambridge University Press: 15 June 2016

ANN IRVINE and

CHRIS CALLISON-BURCH

Show author details

ANN IRVINE: Affiliation:
Johns Hopkins University e-mail: annirvine@gmail.com
CHRIS CALLISON-BURCH: Affiliation:
University of Pennsylvania e-mail: ccb@cis.upenn.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora.

Type: Articles
Information: Natural Language Engineering , Volume 22 , Issue 4: Machine Translation Using Comparable Corpora , July 2016 , pp. 517 - 548

DOI: https://doi.org/10.1017/S1351324916000127 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

†

This material is based on research sponsored by DARPA under contract HR0011-09-1-0044 and by the Johns Hopkins University Human Language Technology Center of Excellence. The views and conclusions contained in this publication are those of the authors and should not be interpreted as representing official policies or endorsements of DARPA or the U.S. Government. We would like to thank David Yarowsky for his tremendous support, and for his inspiring work on – and continued ideas about – learning translations from monolingual texts. We would like to thank Alex Klementiev for his substantial contributions to this research and his comments on a draft of this article. We would like to thank Manaal Faruqui and Sneha Jha for providing the reference translations for the two Hindi paragraphs. Thank you to the two anonymous reviewers who provided valuable feedback on the first draft of this manuscript.

References

Alfonseca, E., Ciaramita, M. and Hall, K. 2009. Gazpacho and summer rash: lexical relationships from temporal patterns of web search queries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Brown, P. F., Cocke, J., Della Pietra, S. A., Della Pietra, V. J., Jelinek, F., Lafferty, J. D., Mercer, R. L., and Roossin, P. S. 1990. A statistical approach to machine translation. Computational Linguistics 16 (2): 79–85, June.Google Scholar

Brown, P. F., Della Pietra, V. J., Della Pietra, S. A., and Mercer, R. L. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19 (2): 263–311, June.Google Scholar

Cherry, C. and Foster, G. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Chu, C., Nakazawa, T. and Kurohashi, S. 2014. Iterative bilingual lexicon extraction from comparable corpora with topical and contextual knowledge. In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing, pp. 296–309. Lecture Notes in Computer Science, vol. 8404. Berlin, Heidelberg: Springer.Google Scholar

Church, K. W. and Gale, W. A. 1995. Poisson mixtures. Natural Language Engineering 1 (2): 163–90.Google Scholar

Church, K. W. and Gale, W. A. 1999. Inverse document frequency (IDF): a measure of deviations from Poisson. In Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., and Yarowsky, D. (eds.), Natural Language Processing Using Very Large Corpora, pp. 283–95. Text, Speech and Language Technology, vol. 11. Netherlands: Springer.Google Scholar

Church, K. W. and Hovy, E. H. 1993. Good applications for crummy machine translation. Machine Translation 8 (4): 239–58.Google Scholar

Clark, J. H., Dyer, C., Lavie, A. and Smith, N. A. 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Daumé, H. and Jagarlamudi, J. 2011. Domain adaptation for machine translation by mining unseen words. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Dou, Q. and Knight, K. 2013. Dependency-based decipherment for resource-limited machine translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, October, Association for Computational Linguistics, pp. 1668–76.Google Scholar

Dou, Q., Vaswani, A. and Knight, K. 2014. Beyond parallel data: joint word alignment and decipherment improves machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October, Association for Computational Linguistics, pp. 557–65.Google Scholar

Fung, P. 1995. Compiling bilingual lexicon entries from a non-parallel english-chinese corpus. In Proceedings of the Workshop on Very Large Corpora, Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Fung, P. and Yee, L. Y. 1998. An IR approach for translating new words from nonparallel, comparable texts. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Hermjakob, U., Knight, K. and Daumé, H. III 2008. Name translation in statistical machine translation - learning when to transliterate. In Proceedings of ACL-08: HLT, Columbus, Ohio, June, Association for Computational Linguistics, pp. 389–97.Google Scholar

Irvine, A. 2014. Using Comparable Corpora to Augment Low Resource SMT Models. PhD Thesis, Johns Hopkins University, Department of Computer Science, Baltimore, Maryland.Google Scholar

Irvine, A. and Callison-Burch, C. 2013a. Combining bilingual and comparable corpora for low resource machine translation. In Proceedings of the Workshop on Statistical Machine Translation (WMT), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Irvine, A. and Callison-Burch, C. 2013b. Supervised bilingual lexicon induction with multiple monolingual signals. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Irvine, A. and Callison-Burch, C. In submission. A Comprehensive Analysis of Bilingual Lexicon Induction.Google Scholar

Irvine, A., Callison-Burch, C., and Klementiev, A. 2010. Transliterating from all languages. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Klementiev, A., Irvine, A., Callison-Burch, C., and Yarowsky, D. 2012. Toward statistical machine translation without parallel corpora. In Proceedings of the Conference of the European Association for Computational Linguistics (EACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Klementiev, A. and Roth, D. 2006. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora. In Proceedings of the Conference of the Association for Computational Linguistics (ACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Koehn, P. and Knight, K. 2002. Learning a translation lexicon from monolingual corpora. In ACL Workshop on Unsupervised Lexical Acquisition, Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Li, H., Kumaran, A., Pervouchine, V. and Zhang, M. 2009. Report of news 2009 machine transliteration shared task. In Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), Suntec, Singapore, August, Association for Computational Linguistics, pp. 1–18.Google Scholar

Och, F. J. and Ney, H. 2002. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Pavlick, E., Post, M., Irvine, A., Kachaev, D., and Callison-Burch, C. 2014. The language demographics of Amazon Mechanical Turk. Transactions of the Association for Computational Linguistics (TACL), 2 (Feb): 79–92.Google Scholar

Peirsman, Y. and Padó, S. 2010. Cross-lingual induction of selectional preferences with bilingual vector spaces. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, California, June, Association for Computational Linguistics, pp. 921–29.Google Scholar

Pekar, V., Mitkov, R., Blagoev, D., and Mulloni, A. 2006. Finding translations for low-frequency words in comparable corpora. Machine Translation, 20 (4): 247–266.Google Scholar

Pierrehumbert, J. B. 2012. Burstiness of verbs and derived nouns. In Santos, D., Lindén, K., and Nganga, W. (eds.), Shall We Play the Festschrift Game?, pp. 99–115. Berlin Heidelberg: Springer.Google Scholar

Post, M., Callison-Burch, C., and Osborne, M. 2012. Constructing parallel corpora for six Indian languages via crowdsourcing. In Proceedings of the Workshop on Statistical Machine Translation (WMT), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Rapp, R. 1995. Identifying word translations in non-parallel texts. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Rapp, R. 1999. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of the Conference of the Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Ravi, S. and Knight, K. 2011. Deciphering foreign language. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, June, Association for Computational Linguistics, pp. 12–21.Google Scholar

Schafer, C. and Yarowsky, D. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In Proceedings of the Conference on Natural Language Learning (CoNLL), Association for Computational Linguistics (ACL). Stroudsburg, PA.Google Scholar

Turney, P. D. and Pantel, P. 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research (JAIR) 37 (1): 141–88.Google Scholar

Virga, P. and Khudanpur, S. 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, Sapporo, Japan, July, Association for Computational Linguistics, pp. 57–64.Google Scholar

Vulić, I., De Smet, W., and Moens, M.-F. 2011. Identifying word translations from comparable corpora using latent topic models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, June, Association for Computational Linguistics, pp. 479–84.Google Scholar

Vulić, I. and Moens, M.-F. 2013. A study on bootstrapping bilingual vector spaces from non-parallel data (and nothing else). In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, October, Association for Computational Linguistics, pp. 1613–24.Google Scholar

Article contents

End-to-end statistical machine translation with zero or small parallel texts†

Abstract

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests