Skip to main content Accessibility help
×
Home

Document ranking refinement using a Markov random field model*

  • ESAÚ VILLATORO (a1), ANTONIO JUÁREZ (a1), MANUEL MONTES (a1), LUIS VILLASEÑOR (a1) and L. ENRIQUE SUCAR (a1)...

Abstract

This paper introduces a novel ranking refinement approach based on relevance feedback for the task of document retrieval. We focus on the problem of ranking refinement since recent evaluation results from Information Retrieval (IR) systems indicate that current methods are effective retrieving most of the relevant documents for different sets of queries, but they have severe difficulties to generate a pertinent ranking of them. Motivated by these results, we propose a novel method to re-rank the list of documents returned by an IR system. The proposed method is based on a Markov Random Field (MRF) model that classifies the retrieved documents as relevant or irrelevant. The proposed MRF combines: (i) information provided by the base IR system, (ii) similarities among documents in the retrieved list, and (iii) relevance feedback information. Thus, the problem of ranking refinement is reduced to that of minimising an energy function that represents a trade-off between document relevance and inter-document similarity. Experiments were conducted using resources from four different tasks of the Cross Language Evaluation Forum (CLEF) forum as well as from one task of the Text Retrieval Conference (TREC) forum. The obtained results show the feasibility of the method for re-ranking documents in IR and also depict an improvement in mean average precision compared to a state of the art retrieval machine.

Copyright

References

Hide All
Baeza-Yates, R., and Ribeiro-Neto, B. 1999. Modern Information Retrival. Addison Wesley, Wokingham, UK.
Balinski, J., and Danilowicz, C. 2005. Re-ranking methos based on inter-document distance. Information Processing and Management 41: 759–75.
Bear, J., Israel, D., Petit, J., and Martin, D. 1997. Using information extraction to improve document retrieval. In Proceedings of the 6th Text Retrieval Conference.
Bendersky, M., and Kurland, O. 2008. Re-ranking search results using document-passage graphs. In Proceedings of the 31st annual international ACM SIGIR Conference on Research and Development in information Retrieval (SIGIR'08), pp. 853–4, ACM Press. Singapore, Singapore.
Besag, J. 1986. On the statistical analysis of dirty pictures (with discussion). Journal of the Royal Statistical Society, Series B 48: 259302.
Carbonetto, P., De Freitas, N., and Barnard, K. 2004. A statistical model for general context object recognition. In Proceedings of the 8th European Conference on Computer Vision, vol. 3021, pp. 350–62. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
Chávez, O., Sucar, L. E., and Montes, M. 2010. Image re-ranking based on relevance feedback combining internal and external similarities. In Proceedings of The FLAIRS Conference, Daytona Beach, Florida, USA.
Crouch, C., Crouch, D., Chen, Q., and Holtz, S. 2002. Improving the Retrieval Effectiveness of Very Short Queries. Information Processing and Management 38.
Deng, H., Lyu, M. R., and King, I. 2009. Effective latent space graph-based re-ranking model with global consistency. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09), pp. 212–21, ACM Press. Barcelona, Spain.
Di Nunzio, G. M., Ferro, N., Mandl, T., and Peters, C. 2008. CLEF 2007: Ad Hoc Track overview. In Post-proceedings of the 8th Workshop of the Cross Language Evaluation Forum CLEF 2007, vol. 5152, pp. 1332. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
Diaz, F. 2005. Regularising Ad Hoc retrieval scores. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05), pp. 672–79, ACM Press. Bremen, Germany.
Escalante, H. J., Montes, M., and Sucar, L. E. 2007. Word Co-occurrence and Markov random fields for improving automatic image annotation. In Proceedings of the 18th British Machine Vision Conference, vol. 2, pp. 600–9. Warwick, UK.
Garafolo, J. S., Auzanne, C. G. P., and Voorhees, E. M. 2000. The TREC spoken document retrieval track: a success story. In Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, pp. 120, Paris.
Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. In IEEE Transactions on: Pattern Analysis and Machine Intelligence, vol. 6, pp. 721–41.
Grossman, D. A., and Frieder, O. 2004. Information Retrieval, Algorithms and Heuristics, 2nd ed. Springer. Dordrecht, The Netherlands.
Grubinger, M. 2007. Analysis and Evaluation of Visual Information Systems Performance. PhD thesis. School of Computer Science and Mathematics, Faculty of Health, Engineering and Science, Victoria University. Melbourne, Australia.
Held, K., Kops, E., Krause, B., Wells, W. III, Kikinis, R., and Müeller, H. 1997. Markov random field segmentation of brain MR images. IEEE Transactions on Medical Imaging 16: 878.
Hernández, C., and Sucar, L. E. 2007. Markov random fields and spatial information to improve automatic image annotation. In Proceedings of the 2007 Pacific-Rim Symposium on Image and Video Technology, vol. 4872, pp. 879–92. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
Kamps, J. 2004. Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In Proceedings of the 21th European Conference on Information Retrieval, vol. 2997, pp. 283–95. Lecture Notes in Computer Science. Springer.
Kemeny, J., Snell, J. L., and Kanpp, A. W. 1976. Denumerable Markov Chains. New York/Heidelberg/Berlin: Springer Verlag.
Kurland, O., and Lee, L. 2005. PageRank without hyper-links: structural re-ranking using links induced by language models. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05), pp. 306–13. ACM Press. Salvador, Brazil.
Lauritzen, S. L. 1996. Graphical Models. New York, NY: Oxford University Press.
Lease, M. 2009. An improved Markov random field model for supporting verbose queries. In Proceedings of the 32th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09), pp. 476–83. ACM Press. Boston, MA, USA.
Lee, K., Park, Y., and Choi, K. S. 2001. Document re-ranking model using clusters. Information Processing and Management 37 (1): 114.
Li, S. Z. 1994. Markov random field models in computer vision. In Proceedings of the European Conference on Computer Vision, vol. 18, pp. 361–70. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
Li, S. Z. 2001. Markov Random Field Modeling in Image Analysis, 2nd. ed. Springer.
Luk, R. W. P., and Wong, K. F. 2004. Pseudo-relevance feedback and title re-ranking for Chinese IR. In Proceedings of the 4th NTCIR Workshop meeting, Cross-lingual Information Retrierval Task. Tokyo, Japan.
Mallows, C. 1975. Non-null ranking models. Biomedika 44: 114–30.
Metzler, D., and Croft, B. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09), pp. 472–9. ACM Press. Salvador, Brazil.
Metzler, D., and Croft, B. 2007. Latent concept expansion using Markov random fields. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07), pp. 311–8. ACM Press. Amsterdam, The Netherlands.
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08), pp. 206–14. ACM Press. Melbourne, Australia.
Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. San Mateo CA: Morgan Kaufman.
Porter, M. F. 1997. An Algorithm for Suffix Stripping, pp. 313–6. Morgan Kaufman Publishers Inc. San Francisco, CA, USA.
Qu, Y., Xu, G., and Wang, J. 2000. Rerank method based on individual thesaurus. In Proceedings of the 2nd NTCIR Workshop on reserach in Chinese and Japanese Text Retrieval and Text Summarization. Tokyo, Japan.
Salton, G., and Buckley, C. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41 (4): 288–97.
Salton, G., Yang, C. S., and Wong, A. 1975. A vector space model for automatic indexing. Communications of the ACM 18 (11): 613–20.
Sarkar, P., and Moore, A. W. 2009. Fast dynamic reranking in large graphs. In Proceedings of the 18th International conference on World Wide Web (WWW'09), pp. 3140, ACM Press. Madrid, Spain.
Smucker, M. D., Allan, J., and Carterette, B. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07), pp. 623–32, ACM Press. Lisbon, Portugal.
Villatoro-Tello, E., Montes-y-Gómez, M., and Villaseñor-Pineda, L. 2009a. A ranking approach based on example texts for geographic information retrieval. In Post-Proceedings of the 9th Workshop of the Cross Language Evaluation Forum CLEF 2008, vol. 5822, pp. 239–50. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
Villatoro-Tello, E., Villaseñor-Pineda, L., and Montes-y-Gómez, M. 2009b. Ranking refinement via relevance feedback in geographic information retrieval. In Proceeding of the Mexican International Conference on Artificial Intelligence MICAI 2009, vol. 5845, pp. 165–76. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
Winkler, G. 2006. Image analysis, random fields and Markov chain monte carlo methods. Springer Series on Applications of Mathematics, Rozovskii, B. and Yor, M. eds. Vol. 27, pp. 179–96, 2nd ed. Springer, Germany.
Xu, J., and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 411. ACM Press. Zurich, Switzerland.
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18 (1): 79112.
Yang, L. P., and Ji, D. H. 2005a. Chinese information retrieval based on terms and relevant terms. ACM Transactions on Asian Language Information Processing 4 (3): 357–74.
Yang, L. P., and Ji, D. H. 2005b. Chinese document re-ranking based on term distribution and maximal marginal relevance. In Proceedings of the 2nd Asian Information Retrieval Symposium AIRS, vol. 3689, pp. 299311. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
Yang, L., Ji, D., and Tang, L. 2004. Document re-ranking based on automatically acquired key terms in Chinese information retrieval. In COLING '04 Proceedings of the 20th International Conference on Computational Linguistics. pp. 480–6. Association for Computational Linguistics. Geneva, Switzerland.
Yang, L., Ji, D., Zhou, G., Nie, Y., and Xiao, G. 2006. Document re-ranking using cluster validation and label propagation. In Proceedings of the ACM CIKM 2006 International Conference on Information and Knowledge Management, pp. 690–7. ACM Press. Arlington, Virginia, USA.
Zhang, B., Hua, L., Yi, L., Lei, J., Wensi, X., Weiguo, F., Zheng, C., and Wei-Ying, M. 2005. Improving web search results using affinity graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 504–11. ACM Press. Salvador, Brazil.
Zhou, D., Lawless, S., Min, J., and Wade, V. 2010a. A late fusion approach to cross-lingual document re-ranking. In Proceedings of the ACM CIKM 2010 International Conference on Information and Knowledge Management, pp. 1433–6. ACM Press. Toronto, ON, Canada.
Zhou, D., Lawless, S., Min, J., and Wade, V. 2010b. Dual-space re-ranking model for document retrieval. In COLING '10 Proceedings of the 23rd international conference on Computational Linguistics, pp. 1524–32. Association for Computational Linguistics. Beijing, China.

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed