Large-Scale Learning to Rank Using Boosted Decision Trees

Krysta M. Svore; Christopher J. C. Burges

doi:10.1017/CBO9781139042918.009

8 - Large-Scale Learning to Rank Using Boosted Decision Trees

from Part Two - Supervised and Unsupervised Learning Algorithms

Published online by Cambridge University Press: 05 February 2012

Krysta M. Svore and

Christopher J. C. Burges

Edited by

Ron Bekkerman ,

Mikhail Bilenko and

John Langford

Show author details

Krysta M. Svore: Affiliation:
Microsoft Research, Redmond, WA, USA
Christopher J. C. Burges: Affiliation:
Microsoft Research, Redmond, WA, USA
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

The web search ranking task has become increasingly important because of the rapid growth of the internet. With the growth of the web and the number of web search users, the amount of available training data for learning web ranking models has also increased. We investigate the problem of learning to rank on a cluster using web search data composed of 140,000 queries and approximately 14 million URLs. For datasets much larger than this, distributed computing will become essential, because of both speed and memory constraints. We compare a baseline algorithm that has been carefully engineered to allow training on the full dataset using a single machine, in order to evaluate the loss or gain incurred by the distributed algorithms we consider. The underlying algorithm we use is a boosted tree ranking algorithm called LambdaMART, where a split at a given vertex in each decision tree is determined by the split criterion for a particular feature. Our contributions are twofold. First, we implement a method for improving the speed of training when the training data fits in main memory on a single machine by distributing the vertex split computations of the decision trees. The model produced is equivalent to the model produced from centralized training, but achieves faster training times. Second, we develop a training method for the case where the training data size exceeds the main memory of a single machine. Our second approach easily scales to far larger datasets, that is, billions of examples, and is based on data distribution.

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 148 - 169

DOI: https://doi.org/10.1017/CBO9781139042918.009 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Banko, M., and Brill, E. 2001. Scaling to Very Very Large Corpora for Natural Language Disambiguation. Pages 26–33 of: Association for Computational Linguistics (ACL).Google Scholar

Burges, C. J., Svore, K. M., Benett, P. N., Pastusiak, A., and Wu, Q. 2011. Learning to Rank Using an Ensemble of Lambda-Gradient Models. Special Edition of JMLR: Proceedings of the Yahoo! Learning to Rank Challenge, 14, 25–35.Google Scholar

Burges, C. J. C. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. Technical Report MSR-TR-2010-82. Microsoft Research.Google Scholar

Burges, C. J. C., Ragno, R., and Le, Q. V. 2006. Learning to Rank with Non-Smooth Cost Functions. In: Advances in Neural Information Processing Systems (NIPS).Google Scholar

Caragea, D., Silvescu, A., and Honavar, V. 2004. A Framework for Learning from Distributed Data using Sufficient Statistics and Its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems, 1(1–2), 80–89.CrossRef Google Scholar PubMed

Dean, J., and Ghemawat, S. 2004. MapReduce: Simplified Data Processing on Large Clusters. In: Symposium on Operating System Design and Implementation (OSDI).Google Scholar

Domingos, P., and Hulten, G. 2000. Mining High-Speed Data Streams. Pages 71–80 of: SIGKDD Conference on Knowledge and Data Mining (KDD).Google Scholar

Domingos, P., and Hulten, G. 2001. A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering. In: International Conference on Machine Learning (ICML).Google Scholar

Donmez, P., Svore, K., and Burges, C. J. C. 2009. On the Local Optimality of LambdaRank. In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).CrossRef Google Scholar

Fan, W., Stolfo, S., and Zhang, J. 1999. The Application of AdaBoost for Distributed, Scalable and Online Learning. Pages 362–366 of: SIGKDD Conference on Knowledge and Data Mining (KDD).Google Scholar

Friedman, J. 2001. Greedy Function Approximation: A Gradient BoostingMachine. Annals of Statistics, 25(5), 1189–1232.CrossRef Google Scholar

Jarvelin, K., and Kekalainen, J. 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. Pages 41–48 of: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).Google Scholar

Lazarevic, A. 2001. The distributed boosting algorithm. Pages 311–316 of: SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).Google Scholar

Lazarevic, A., and Obradovic, Z. 2002. Boosting Algorithms for Parallel and Distributed Learning. Distributed and Parallel Databases, 11, 203–229.CrossRef Google Scholar

Panda, B., Herbach, J. S., Basu, S., and Bayardo, R. J. 2009. PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. In: International Conference on Very Large Databases (VLDB).Google Scholar

Provost, F., and Fayyad, U. 1999. A Survey of Methods for Scaling Up Induction Algorithms. Data Mining and Knowledge Discovery, 3, 131–169.CrossRef Google Scholar

van Uyen, N. T., and Chung, T. 2007. A New Framework for Distributed Boosting Algorithm. Pages 420–423 of: Future Generation Communication and Networking (FGCN).CrossRef Google Scholar

Wu, Q., Burges, C. J. C., Svore, K. M., and Gao, J. 2010. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, 13(3), 254–270.CrossRef Google Scholar

Yahoo! Learning to Rank Challenge. 2010. http://learningtorankchallenge.yahoo.com/.

Book contents

8 - Large-Scale Learning to Rank Using Boosted Decision Trees

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive