Hostname: page-component-76fb5796d-skm99 Total loading time: 0 Render date: 2024-04-26T10:09:47.564Z Has data issue: false hasContentIssue false

Domain adaptation strategies in statistical machine translation: a brief overview

Published online by Cambridge University Press:  30 October 2015

Marta R. Costa-Jussà*
Affiliation:
Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico DF, State of Mexico, Mexico e-mail: marta@nlp.cic.ipn.edu

Abstract

Statistical machine translation (SMT) is gaining interest given that it can easily be adapted to any pair of languages. One of the main challenges in SMT is domain adaptation because the performance in translation drops when testing conditions deviate from training conditions. Many research works are arising to face this challenge. Research is focused on trying to exploit all kinds of material, if available. This paper provides an overview of research, which copes with the domain adaptation challenge in SMT.

Type
Articles
Copyright
© Cambridge University Press, 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abekawa, T. & Kageura, K. 2007. A translation aid system with a stratified lookup interface. In ACL. The Association for Computer Linguistics.CrossRefGoogle Scholar
Axelrod, A., He, X. & Gao, J. 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ‘11), 355–362.Google Scholar
Barrachina, S., Bender, O., Casacuberta, F., Civera, J., Cubel, E., Khadivi, S., Lagarda, A., Ney, H., Toms, J. & Vidal, E. 2009. Statistical approaches to computer-assisted translation. Computational Linguistics 35(1), 328.CrossRefGoogle Scholar
Bertoldi, N. & Federico, M. 2009. Domain adaptation for statistical machine translation with monolingual resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation, 182–189. Association for Computational Linguistics, March.CrossRefGoogle Scholar
Bulyko, I., Matsourkas, S., Schwartz, R., Nguyen, L. & Makhoul, J. 2007. Language model adaptation in machine translation from speech. In Proceedings of the 32nd International Conference on Acoustics, Speech and Signal Processing (ICASSP), 117–120.Google Scholar
Carpuat, M. & Wu, D. 2007. Improving statistical machine translation using word sense disambiguation. In Empirical Methods in Natural Language Processing (EMNLP), 61–72, June.Google Scholar
Ceausu, A., Tinsley, J., Zhang, J. & Way, A. 2011. Experiments on domain adaptation for patent machine translation in the PLuTO project. In Proceedings of the EAMT.Google Scholar
Civera, J. & Juan, A. 2007. Domain adaptation in statistical machine translation with mixture modelling. In Proceedings of the Second Workshop on Statistical Machine Translation (StatMT ‘07), 177–180.Google Scholar
Costa-jussà, M. R., Banchs, R. E., Rapp, R., Lambert, P., Eberle, K. & Babych, B. 2013. Workshop on hybrid approaches to translation: overview and developments. In Proceedings of the ACL Second Workshop on Hybrid Approaches to Translation (HyTra). Association for Computational Linguistics.Google Scholar
Daum, H. III & Jagarlamudi, J. 2011. Domain adaptation for machine translation by mining unseen words. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers – Volume 2 (HLT ‘11), 407–412.Google Scholar
Eck, M., Vogel, S. & Waibel, A. 2004. Language model adaptation for statistical machine translation based on information retrieval. In Proceedings of the LREC, 327–330.Google Scholar
España-Bonet, C., Giménez, J. & Màrquez, L. 2010. Discriminative phrase-based models for Arabic machine translation. ACM Transactions on Asian Language Information Processing Journal (TALIP), 8, 1–20, March.CrossRefGoogle Scholar
Esteban, J., Lorenzo, J., Valderrábanos, A. S. & Lapalme, G. 2004. TransType 2 – an innovative computer-assisted translation system. In The Companion Volume to the Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics, 94–97.Google Scholar
Farrús, M., Costa-jussà, M. R., Hernández, A., Hneríquez, C., Mariño, J. B. & Fonollosa, J. A. R. 2009. On the enhancement of Catalan-Spanish Ngram-based translation by using human evaluation. Language Resources and Evaluation.Google Scholar
Finch, A. & Sumita, E. 2008. Dynamic model interpolation for statistical machine translation. In Proceedings of the Third Workshop on Statistical Machine Translation, 208–215.Google Scholar
Formiga, L., Costa-jussà, M. R., Mariño, J. B., Fonollosa, J. A. R., Barrón-Cedeño, A. & Màrquez, L. 2013. The TALP-UPC phrase-based translation systems for WMT13: system combination with morphology generation, domain adaptation and corpus filtering. In Proceedings of the Eighth Workshop on Statistical Machine Translation. Association for Computational Linguistics.Google Scholar
Formiga, L., Hernández, A., Mariño, J. B. & Monte, E. 2012. Improving English to Spanish out-of-domain translations by morphology generalization and generation. In Proceedings of the AMTA Monolingual Machine Translation-2012 Workshop.Google Scholar
Foster, G., Goutte, C. & Kuhn, R. 2010. Discriminative instance weighting for domain adaptation in statistical machine translation. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 451–459.Google Scholar
Foster, G. & Kuhn, R. 2007. Mixture-model adaptation for SMT. In Proceedings of the Second Workshop on Statistical Machine Translation, 128–135.Google Scholar
Foster, G., Kuhn, R. & Johnson, H. 2006. Phrasetable smoothing for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 53–61.Google Scholar
Haque, R. 2011. Integrating Source-Language Context into Log-linear Models of Statistical Machine Translation. PhD thesis, Dublin City University.CrossRefGoogle Scholar
Hardt, D. & Elming, J. 2010. Incremental re-training for post-editing SMT. In Proceedings of the 9th Annual Conference of the Association for Machine Translation in the Americas.Google Scholar
Henríquez, C. A., Mariño, J. B. & Banchs, R. E. 2011. Deriving translation units using small additional corpora. In Proceedings of the 15th Conference of the European Association for Machine Translation.Google Scholar
Hildebrand, A. S., Eck, M., Vogel, S. & Waibel, A. 2005. Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of EAMT, 133–142.Google Scholar
Khalilov, M., Costa-Jussà, M. R., Henríquez, C. A., Fonollosa, J. A. R., Hernández, A., Mariño, J. B., Banchs, R. E., Chen, B., Zhang, M., Aw, A. & Li, H. 2008. The TALP & I2R SMT systems for IWSLT 2008. In Proceedings of the International Workshop on Spoken Language Translation, 116–123.Google Scholar
Koehn, P. 2010. Statistical Machine Translation. Cambridge University Press.CrossRefGoogle Scholar
Koehn, P. & Schroeder, J. 2007. Experiments in domain adaptation for statistical machine translation. In Annual Meeting of the Association for Computational Linguistics: Proceedings of the Second Workshop on Statistical Machine Translation (WMT), 224–227, June.CrossRefGoogle Scholar
Levenberg, A., Callison-Burch, C. & Osborne, M. 2010. Stream-based translation models for statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, 394–402.Google Scholar
López, A. 2008. Statistical machine translation. ACM Computing Surveys 40(3), 149.CrossRefGoogle Scholar
Marcu, D. 2001. Towards a unified approach to memory- and statistical-based machine translation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, 386–393.Google Scholar
Moore, R. C. & Lewis, W. 2010. Intelligent selection of language model training data. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics Short Papers, 220–224.Google Scholar
Niehues, J. & Waibel, A. 2010. Domain adaptation in statistical machine translation using factored translation models. In Proceedings of EAMT.Google Scholar
Okuma, H., Yamamoto, H. & Sumita, E. 2008. Introducing a translation dictionary into phrase-based SMT system. IEICE TRANSACTIONS on Information and Systems, E91-D 7, 20512057.CrossRefGoogle Scholar
Ortíz-Martnez, D., García-Varea, I. & Casacuberta, F. 2010. Online learning for interactive statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, 546–554.Google Scholar
Pecina, P., Toral, A., Way, A., Papavassiliou, V., Prokopidis, P. & Giagkou, M. 2011. Towards using web-crawled data for domain adaptation in statistical machine translation. In Proceedings of the EAMT.Google Scholar
Rogati, M. 2009. Domain Adaptation of Translation Models for Multilingual Applications. PhD thesis, Carnegie Mellon University.Google Scholar
Schwenk, H., Costa-jussà, M. R. & Fonollosa, J. A. R. 2007. Smooth bilingual translation. In Proceedings of the Empirical Methods in Natural Language Processing, 430–438.Google Scholar
Schwenk, H. & Estève, Y. 2008. Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation. In Proceedings of the Interspeech.CrossRefGoogle Scholar
Sennrich, R. 2012. Perplexity minimization for translation model domain adaptation in statistical machine translation. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 539–549.Google Scholar
Skadiņa, I., Aker, A., Mastropavlos, N., Su, F., Tufiş, D., Verlič, M., Vasiļjevs, A., Babych, B., Clough, P., Gaizauskas, R., Glaros, N., Paramita, M.L., Pinnis, M. 2012. Collecting and Using Comparable Corpora for Statistical Machine Translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 21–27 May 2012, 438445.Google Scholar
Ueffing, N., Haffari, G. & Sarkar, A. 2008. Semi-supervised model adaptation for statistical machine translation. Machine Translation Journal.CrossRefGoogle Scholar
Wu, H., Wang, H. & Zong, C. 2008. Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, 993–1000.Google Scholar
Zens, R. & Ney, H. 2004. Improvements in phrase-based statistical machine translation. In Proceedings of the Human Language Technology Conference, 257–264.Google Scholar