Document ranking refinement using a Markov random field model*

ESAÚ VILLATORO; ANTONIO JUÁREZ; MANUEL MONTES; LUIS VILLASEÑOR; L. ENRIQUE SUCAR

doi:10.1017/S1351324912000010

Document ranking refinement using a Markov random field model*

Published online by Cambridge University Press: 14 March 2012

LUIS VILLASEÑOR and

ESAÚ VILLATORO: Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx
ANTONIO JUÁREZ: Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx
MANUEL MONTES: Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx
LUIS VILLASEÑOR: Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx
L. ENRIQUE SUCAR: Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper introduces a novel ranking refinement approach based on relevance feedback for the task of document retrieval. We focus on the problem of ranking refinement since recent evaluation results from Information Retrieval (IR) systems indicate that current methods are effective retrieving most of the relevant documents for different sets of queries, but they have severe difficulties to generate a pertinent ranking of them. Motivated by these results, we propose a novel method to re-rank the list of documents returned by an IR system. The proposed method is based on a Markov Random Field (MRF) model that classifies the retrieved documents as relevant or irrelevant. The proposed MRF combines: (i) information provided by the base IR system, (ii) similarities among documents in the retrieved list, and (iii) relevance feedback information. Thus, the problem of ranking refinement is reduced to that of minimising an energy function that represents a trade-off between document relevance and inter-document similarity. Experiments were conducted using resources from four different tasks of the Cross Language Evaluation Forum (CLEF) forum as well as from one task of the Text Retrieval Conference (TREC) forum. The obtained results show the feasibility of the method for re-ranking documents in IR and also depict an improvement in mean average precision compared to a state of the art retrieval machine.

Type: Articles
Information: Natural Language Engineering , Volume 18 , Special Issue 2: Statistical Learning of Natural Language Structured Input and Output , April 2012 , pp. 155 - 185

DOI: https://doi.org/10.1017/S1351324912000010 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baeza-Yates, R., and Ribeiro-Neto, B. 1999. Modern Information Retrival. Addison Wesley, Wokingham, UK.Google Scholar

Balinski, J., and Danilowicz, C. 2005. Re-ranking methos based on inter-document distance. Information Processing and Management 41: 759–75.CrossRef Google Scholar

Bear, J., Israel, D., Petit, J., and Martin, D. 1997. Using information extraction to improve document retrieval. In Proceedings of the 6th Text Retrieval Conference.Google Scholar

Bendersky, M., and Kurland, O. 2008. Re-ranking search results using document-passage graphs. In Proceedings of the 31st annual international ACM SIGIR Conference on Research and Development in information Retrieval (SIGIR'08), pp. 853–4, ACM Press. Singapore, Singapore.Google Scholar

Besag, J. 1986. On the statistical analysis of dirty pictures (with discussion). Journal of the Royal Statistical Society, Series B 48: 259–302.Google Scholar

Carbonetto, P., De Freitas, N., and Barnard, K. 2004. A statistical model for general context object recognition. In Proceedings of the 8th European Conference on Computer Vision, vol. 3021, pp. 350–62. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar

Chávez, O., Sucar, L. E., and Montes, M. 2010. Image re-ranking based on relevance feedback combining internal and external similarities. In Proceedings of The FLAIRS Conference, Daytona Beach, Florida, USA.Google Scholar

Crouch, C., Crouch, D., Chen, Q., and Holtz, S. 2002. Improving the Retrieval Effectiveness of Very Short Queries. Information Processing and Management 38.CrossRef Google Scholar

Deng, H., Lyu, M. R., and King, I. 2009. Effective latent space graph-based re-ranking model with global consistency. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09), pp. 212–21, ACM Press. Barcelona, Spain.CrossRef Google Scholar

Di Nunzio, G. M., Ferro, N., Mandl, T., and Peters, C. 2008. CLEF 2007: Ad Hoc Track overview. In Post-proceedings of the 8th Workshop of the Cross Language Evaluation Forum CLEF 2007, vol. 5152, pp. 13–32. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar

Diaz, F. 2005. Regularising Ad Hoc retrieval scores. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05), pp. 672–79, ACM Press. Bremen, Germany.Google Scholar

Escalante, H. J., Montes, M., and Sucar, L. E. 2007. Word Co-occurrence and Markov random fields for improving automatic image annotation. In Proceedings of the 18th British Machine Vision Conference, vol. 2, pp. 600–9. Warwick, UK.Google Scholar

Garafolo, J. S., Auzanne, C. G. P., and Voorhees, E. M. 2000. The TREC spoken document retrieval track: a success story. In Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, pp. 1–20, Paris.Google Scholar

Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. In IEEE Transactions on: Pattern Analysis and Machine Intelligence, vol. 6, pp. 721–41.Google Scholar

Grossman, D. A., and Frieder, O. 2004. Information Retrieval, Algorithms and Heuristics, 2nd ed. Springer. Dordrecht, The Netherlands.CrossRef Google Scholar

Grubinger, M. 2007. Analysis and Evaluation of Visual Information Systems Performance. PhD thesis. School of Computer Science and Mathematics, Faculty of Health, Engineering and Science, Victoria University. Melbourne, Australia.Google Scholar

Held, K., Kops, E., Krause, B., Wells, W. III, Kikinis, R., and Müeller, H. 1997. Markov random field segmentation of brain MR images. IEEE Transactions on Medical Imaging 16: 878.CrossRef Google Scholar PubMed

Hernández, C., and Sucar, L. E. 2007. Markov random fields and spatial information to improve automatic image annotation. In Proceedings of the 2007 Pacific-Rim Symposium on Image and Video Technology, vol. 4872, pp. 879–92. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar

Kamps, J. 2004. Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In Proceedings of the 21th European Conference on Information Retrieval, vol. 2997, pp. 283–95. Lecture Notes in Computer Science. Springer.Google Scholar

Kemeny, J., Snell, J. L., and Kanpp, A. W. 1976. Denumerable Markov Chains. New York/Heidelberg/Berlin: Springer Verlag.CrossRef Google Scholar

Kurland, O., and Lee, L. 2005. PageRank without hyper-links: structural re-ranking using links induced by language models. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05), pp. 306–13. ACM Press. Salvador, Brazil.Google Scholar

Lauritzen, S. L. 1996. Graphical Models. New York, NY: Oxford University Press.CrossRef Google Scholar

Lease, M. 2009. An improved Markov random field model for supporting verbose queries. In Proceedings of the 32th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09), pp. 476–83. ACM Press. Boston, MA, USA.Google Scholar

Lee, K., Park, Y., and Choi, K. S. 2001. Document re-ranking model using clusters. Information Processing and Management 37 (1): 1–14.Google Scholar

Li, S. Z. 1994. Markov random field models in computer vision. In Proceedings of the European Conference on Computer Vision, vol. 18, pp. 361–70. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar

Li, S. Z. 2001. Markov Random Field Modeling in Image Analysis, 2nd. ed. Springer.Google Scholar

Luk, R. W. P., and Wong, K. F. 2004. Pseudo-relevance feedback and title re-ranking for Chinese IR. In Proceedings of the 4th NTCIR Workshop meeting, Cross-lingual Information Retrierval Task. Tokyo, Japan.Google Scholar

Mallows, C. 1975. Non-null ranking models. Biomedika 44: 114–30.Google Scholar

Metzler, D., and Croft, B. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09), pp. 472–9. ACM Press. Salvador, Brazil.Google Scholar

Metzler, D., and Croft, B. 2007. Latent concept expansion using Markov random fields. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07), pp. 311–8. ACM Press. Amsterdam, The Netherlands.Google Scholar

Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08), pp. 206–14. ACM Press. Melbourne, Australia.Google Scholar

Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. San Mateo CA: Morgan Kaufman.Google Scholar

Porter, M. F. 1997. An Algorithm for Suffix Stripping, pp. 313–6. Morgan Kaufman Publishers Inc. San Francisco, CA, USA.Google Scholar

Qu, Y., Xu, G., and Wang, J. 2000. Rerank method based on individual thesaurus. In Proceedings of the 2nd NTCIR Workshop on reserach in Chinese and Japanese Text Retrieval and Text Summarization. Tokyo, Japan.Google Scholar

Salton, G., and Buckley, C. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41 (4): 288–97.3.0.CO;2-H>CrossRef Google Scholar

Salton, G., Yang, C. S., and Wong, A. 1975. A vector space model for automatic indexing. Communications of the ACM 18 (11): 613–20.CrossRef Google Scholar

Sarkar, P., and Moore, A. W. 2009. Fast dynamic reranking in large graphs. In Proceedings of the 18th International conference on World Wide Web (WWW'09), pp. 31–40, ACM Press. Madrid, Spain.Google Scholar

Smucker, M. D., Allan, J., and Carterette, B. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07), pp. 623–32, ACM Press. Lisbon, Portugal.Google Scholar

Villatoro-Tello, E., Montes-y-Gómez, M., and Villaseñor-Pineda, L. 2009a. A ranking approach based on example texts for geographic information retrieval. In Post-Proceedings of the 9th Workshop of the Cross Language Evaluation Forum CLEF 2008, vol. 5822, pp. 239–50. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar

Villatoro-Tello, E., Villaseñor-Pineda, L., and Montes-y-Gómez, M. 2009b. Ranking refinement via relevance feedback in geographic information retrieval. In Proceeding of the Mexican International Conference on Artificial Intelligence MICAI 2009, vol. 5845, pp. 165–76. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar

Winkler, G. 2006. Image analysis, random fields and Markov chain monte carlo methods. Springer Series on Applications of Mathematics, Rozovskii, B. and Yor, M. eds. Vol. 27, pp. 179–96, 2nd ed. Springer, Germany.Google Scholar

Xu, J., and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–11. ACM Press. Zurich, Switzerland.Google Scholar

Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18 (1): 79–112.CrossRef Google Scholar

Yang, L. P., and Ji, D. H. 2005a. Chinese information retrieval based on terms and relevant terms. ACM Transactions on Asian Language Information Processing 4 (3): 357–74.Google Scholar

Yang, L. P., and Ji, D. H. 2005b. Chinese document re-ranking based on term distribution and maximal marginal relevance. In Proceedings of the 2nd Asian Information Retrieval Symposium AIRS, vol. 3689, pp. 299–311. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar

Yang, L., Ji, D., and Tang, L. 2004. Document re-ranking based on automatically acquired key terms in Chinese information retrieval. In COLING '04 Proceedings of the 20th International Conference on Computational Linguistics. pp. 480–6. Association for Computational Linguistics. Geneva, Switzerland.Google Scholar

Yang, L., Ji, D., Zhou, G., Nie, Y., and Xiao, G. 2006. Document re-ranking using cluster validation and label propagation. In Proceedings of the ACM CIKM 2006 International Conference on Information and Knowledge Management, pp. 690–7. ACM Press. Arlington, Virginia, USA.Google Scholar

Zhang, B., Hua, L., Yi, L., Lei, J., Wensi, X., Weiguo, F., Zheng, C., and Wei-Ying, M. 2005. Improving web search results using affinity graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 504–11. ACM Press. Salvador, Brazil.CrossRef Google Scholar

Zhou, D., Lawless, S., Min, J., and Wade, V. 2010a. A late fusion approach to cross-lingual document re-ranking. In Proceedings of the ACM CIKM 2010 International Conference on Information and Knowledge Management, pp. 1433–6. ACM Press. Toronto, ON, Canada.Google Scholar

Zhou, D., Lawless, S., Min, J., and Wade, V. 2010b. Dual-space re-ranking model for document retrieval. In COLING '10 Proceedings of the 23rd international conference on Computational Linguistics, pp. 1524–32. Association for Computational Linguistics. Beijing, China.Google Scholar

Article contents

Document ranking refinement using a Markov random field model*

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests