A learning to rank approach for cross-language information retrieval exploiting multiple translation resources

Hosein Azarbonyad; Azadeh Shakery; Heshaam Faili

doi:10.1017/S1351324919000032

A learning to rank approach for cross-language information retrieval exploiting multiple translation resources

Published online by Cambridge University Press: 05 March 2019

Hosein Azarbonyad ,

Azadeh Shakery and

Heshaam Faili

Show author details

Hosein Azarbonyad: Affiliation:
Science Faculty, Informatics Institute, University of Amsterdam, Amsterdam, Netherlands
Azadeh Shakery*: Affiliation:
Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
Heshaam Faili: Affiliation:
Department of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
*: *Corresponding author. Email: shakery@ut.ac.ir

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Cross-language information retrieval (CLIR), finding information in one language in response to queries expressed in another language, has attracted much attention due to the explosive growth of multilingual information in the World Wide Web. One important issue in CLIR is how to apply monolingual information retrieval (IR) methods in cross-lingual environments. Recently, learning to rank (LTR) approach has been successfully employed in different IR tasks. In this paper, we use LTR for CLIR. In order to adapt monolingual LTR techniques in CLIR and pass the barrier of language difference, we map monolingual IR features to CLIR ones using translation information extracted from different translation resources. The performance of CLIR is highly dependent on the size and quality of available bilingual resources. Effective use of available resources is especially important in low-resource language pairs. In this paper, we further propose an LTR-based method for combining translation resources in CLIR. We have studied the effectiveness of the proposed approach using different translation resources. Our results also show that LTR can be used to successfully combine different translation resources to improve the CLIR performance. In the best scenario, the LTR-based combination method improves the performance of single-resource-based CLIR method by 6% in terms of Mean Average Precision.

Keywords

cross-language information retrieval learning to rank translation resources resource combination for CLIR

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 3 , May 2019 , pp. 363 - 384

DOI: https://doi.org/10.1017/S1351324919000032 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., Di Nunzio, G.M., Ferro, N., Mandl, T. and Peters, C. (2009). CLEF 2009: ad hoc track overview. In Proceedings of the 2009 International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-language Evaluation Forum, CLEF ‘09, pp. 15–37. Berlin, Germany: Springer.Google Scholar

AleAhmad, A., Amiri, H., Darrudi, E., Rahgozar, M. and Oroumchian, F. (2009). Hamshahri: A standard persian text collection. Knowledge-Based Systems 22(5), 382–387.CrossRef Google Scholar

Aljlayl, M. and Frieder, O. (2001). Effective Arabic-English cross-language information retrieval via machine-readable dictionaries and machine translation. In Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ‘01, pp. 295–302. New York, NY, USA: ACM.Google Scholar

Azarbonyad, H., Shakery, A. and Faili, H. (2012). Using learning to rank approach for parallel corpora based cross language information retrieval. In Proceedings of the 20th European Conference on Artificial Intelligence, ECAI’ 12, pp. 79–84. Amsterdam, the Netherlands: IOS Press.Google Scholar

Azarbonyad, H., Shakery, A. and Faili, H. (2013). Exploiting multiple translation resources for English-Persian cross language information retrieval. In Information Access Evaluation. Multilinguality, Multimodality, and Visualization, CLEF ‘13, pp. 93–99. Berlin, Germany: Springer.CrossRef Google Scholar

Azarbonyad, H., Shakery, A. and Faili, H. (2014). Learning to exploit different translation resources for cross language information retrieval. International Journal of Information and Communication Technology Research 6(1), 55–68.Google Scholar

Berger, A. and Lafferty, J. (1999). Information retrieval as statistical translation. In Proceedings of the 22Nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘99, pp. 222–229. New York, NY, USA: ACM.Google Scholar

Brown, P.F., Pietra, V.J.D., Pietra, S.A.D. and Mercer, R.L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311.Google Scholar

Cao, Y., Xu, J., Liu, T.Y., Li, H., Huang, Y. and Hon, H.W. (2006). Adapting ranking SVM to document retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘06, pp. 186–193. New York, NY, USA: ACM.Google Scholar

Chen, A. and Gey, F.C. (2004). Multilingual information retrieval using machine translation, relevance feedback and decompounding. Information Retrieval 7(1-2), 149–182.CrossRef Google Scholar

Chen, A., Jiang, H. and Gey, F. (2000). Combining multiple sources for short query translation in Chinese-English cross-language information retrieval. In Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, IRAL ‘00, pp. 17–23.CrossRef Google Scholar

Darwish, K. and Oard, D.W. (2003). Probabilistic structured query methods. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘03, pp. 338–344. New York, NY, USA: ACM.Google Scholar

Da San Martino, G., Romeo, S., Barroón-Cedeño, A., Joty, S., Maàrquez, L., Moschitti, A. and Nakov, P. (2017). Cross-language question re-ranking. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘17, pp. 1145–1148. New York, NY, USA: ACM.CrossRef Google Scholar

Ferro, N. and Peters, C. (2010). CLEF 2009 ad hoc track overview: TEL and Persian tasks. In Proceedings of the 2010 International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-language Evaluation Forum, CLEF ‘10, pp. 13–35. New York, NY, USA: ACM.Google Scholar

Finkel, J.R., Grenager, T. and Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ‘05, pp. 363–370. Stroudsburg, PA, USA: Association for Computational Linguistics.Google Scholar

Gao, W., Niu, C., Zhou, M. and Wong, K.F. (2009). Joint ranking for multilingual web search. In Proceedings of the 31st European Conference on IR Research on Advances in Information Retrieval, ECIR ‘09. Berlin, Germany: Springer.Google Scholar

Ghanbari, E. and Shakery, A. (2018). Query-dependent learning to rank for cross-lingual information retrieval. Knowledge and Information Systems. https://doi.org/10.1007/s10115-018-1232-8 CrossRef Google Scholar

Godavarthy, A. and Fang, Y. (2016). Cross-language microblog retrieval using latent semantic modeling. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, ICTIR ‘16, pp. 303–306. New York, NY, USA: ACM.Google Scholar

Gollins, T. and Sanderson, M. (2001). Improving cross-language retrieval with triangulated translation. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘01, pp. 90–95. New York, NY, USA: ACM.Google Scholar

Hashemi, H.B. (2011). Using comparable corpora for English–Persian cross-language information retrieval. Dissertation, University of Tehran, Iran.Google Scholar

Hashemi, H.B. and Shakery, A. (2014). Mining a Persian–English comparable corpus for cross-language information retrieval. Information Processing and Management 50(2), 384–398.CrossRef Google Scholar

Hashemi, H.B., Shakery, A. and Faili, H. (2010). Creating a Persian–English comparable corpus. In Proceedings of the 2010 International Conference on Multilingual and Multimodal Information Access Evaluation: Cross-language Evaluation Forum, CLEF’10, pp. 27–39. Berlin, Germany: Springer.CrossRef Google Scholar

Jabbari, F., Bakhshaei, S., Mohammadzadeh Ziabary, S. and Khadivi, S. (2012). Developing an open-domain English-Farsi translation system using AFEC: Amirkabir bilingual Farsi-English corpus. In Proceedings of 10th Biennial Conference of the Association for Machine Translation in the Americas.Google Scholar

Jones, G.J.F. and Lam-Adesina, A.M. (2002). Exeter at clef 2001: Experiments with machine translation for bilingual retrieval. In: Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems, CLEF ‘01, pp. 59–77. Berlin, Germany: Springer.CrossRef Google Scholar

Kadri, Y. and Nie, J. (2007). Combining resources with confidence measures for cross language information retrieval. In Proceedings of the ACM First Ph.D. Workshop in CIKM, PIKM ‘07, pp. 131–138. New York, NY, USA: ACM.CrossRef Google Scholar

Lavrenko, V., Choquette, M. and Croft, W.B. (2002). Cross-lingual relevance models. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘02, pp. 175–182. New York, NY, USA: ACM.Google Scholar

Li, H. (2011) Learning to Rank for Information Retrieval and Natural Language Processing. San Rafael, California: Morgan & Claypool Publishers.CrossRef Google Scholar

Lu, W.H., Chien, L.F. and Lee, H.J. (2004). Anchor text mining for translation of web queries: a transitive translation approach. ACM Transactions on Information Systems 22(2), 242–269.CrossRef Google Scholar

Mansouri, A. and Faili, H. (2012). State-of-the-art English to Persian statistical machine translation system. In 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 174–179.CrossRef Google Scholar

McNamee, P. and Mayfield, J. (2002). Comparing cross-language query expansion techniques by degrading translation resources. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘02, pp. 159–166. New York, NY, USA: ACM.Google Scholar

Metzler, D. and Croft, B. (2007). Linear feature-based models for information retrieval. Information Retrieval 10(3), 257–274.CrossRef Google Scholar

Mosavi Miangah, T. (2009). Constructing a large-scale English-Persian parallel corpus. Meta: Journal des traducteursMeta: Translators’ Journal 54(1), 181–188.CrossRef Google Scholar

Nallapati, R. (2004). Discriminative models for information retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘04, pp. 64–71. New York, NY, USA: ACM.Google Scholar

Nie, J. (2010). Cross-Language Information Retrieval. San Rafael, California: Morgan & Claypool Publishers.Google Scholar

Nie, J., Isabelle, P., Plamondon, P. and Foster, G. (1998). Using a probabilistic translation model for cross-language information retrieval. In 6th Workshop on Very Large Corpora, pp. 18–27.Google Scholar

Pirkola, A., Hedlund, T., Keskustalo, H. and Järvelin, K. (2001). Dictionary-based cross-language information retrieval: problems, methods, and research findings. Information Retrieval 4(3-4), 209–230.CrossRef Google Scholar

Rahimi, R. and Shakery, A. (2013). A language modeling approach for extracting translation knowledge from comparable corpora. In Proceedings of the 35th European Conference on Advances in Information Retrieval, ECIR’13, pp. 606–617. Berlin, Germany: Springer.CrossRef Google Scholar

Rahimi, Z. and Shakery, A. (2011). Topic based creation of a Persian-English comparable corpus. In Proceedings of the 7th Asia Conference on Information Retrieval Technology, AIRS’11, pp. 458–469. Berlin, Germany: Springer.CrossRef Google Scholar

Rahimi, R. and Shakery, A. (2017). Online learning to rank for cross-language information retrieval. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘17, pp. 1033–1036. New York, NY, USA: ACM.CrossRef Google Scholar

Ren, F. and Bracewell, D.B. (2009). Advanced information retrieval. Electronic Notes on Theoretical Computer Science 225, 303–317.CrossRef Google Scholar

Robertson, S.E., Walker, S. and Beaulieu, M. (2000). Experimentation as a way of life: Okapi at TREC. Information Processing and Management 36(1), 95–108.CrossRef Google Scholar

Shakery, A. and Zhai, C. (2013). Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs. Information Retrieval 16(1), 1–29.CrossRef Google Scholar

Talvensaari, T., Pirkola, A., Järvelin, K., Juhola, M. and Laurikkala, J. (2008). Focused web crawling in the acquisition of comparable corpora. Information Retrieval 11(5), 427–445.CrossRef Google Scholar

Tao, T. and Zhai, C. (2005). Mining comparable bilingual text corpora for cross-language information integration. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ‘05, pp. 691–696. New York, NY, USA: ACM.CrossRef Google Scholar

Tsai, M.F., Wang, Y.T. and Chen, H.H. (2008). A study of learning a merge model for multilingual information retrieval. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘08, pp. 195–202. New York, NY, USA: ACM.Google Scholar

Türe, F., Lin, J.J. and Oard, D.W. (2012). Combining statistical translation techniques for cross-language information retrieval. In Proceedings of the 24th International Conference on Computational Linguistics, COLING ‘12, pp. 2685–2702.Google Scholar

Usunier, N., Amini, M.R. and Goutte, C. (2011). Multiview semi-supervised learning for ranking multilingual documents. In Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III, ECML PKDD’11, pp. 443–458. Berlin, Germany: Springer.Google Scholar

Vulić, I. and Moens, M.F. (2015). Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘15, pp. 363–372. New York, NY, USA: ACM.Google Scholar

Wu, D. and He, D. (2012). Exploring the further integration of machine translation in English-Chinese cross language information access. Program: Electronic Library and Information Systems 46(4), 429–457.CrossRef Google Scholar

Xu, J. and Li, H. (2007). AdaRank: a boosting algorithm for information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘07, pp. 391–398. New York, NY, USA: ACM.Google Scholar

Article contents

A learning to rank approach for cross-language information retrieval exploiting multiple translation resources

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests