MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles

Biswanath Panda; Joshua S. Herbach; Sugato Basu; Roberto J. Bayardo

doi:10.1017/CBO9781139042918.003

2 - MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles

from Part One - Frameworks for Scaling Up Machine Learning

Published online by Cambridge University Press: 05 February 2012

Sugato Basu and

Edited by

Mikhail Bilenko and

Biswanath Panda: Affiliation:
Google Inc., Mountain View, CA, USA
Joshua S. Herbach: Affiliation:
Google Inc., Mountain View, CA, USA
Sugato Basu: Affiliation:
Google Research, Mountain View, CA, USA
Roberto J. Bayardo: Affiliation:
Google Research, Mountain View, CA, USA
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

In this chapter we look at leveraging the MapReduce distributed computing framework (Dean and Ghemawat, 2004) for parallelizing machine learning methods of wide interest, with a specific focus on learning ensembles of classification or regression trees. Building a production-ready implementation of a distributed learning algorithm can be a complex task. With the wide and growing availability of MapReduce-capable computing infrastructures, it is natural to ask whether such infrastructures may be of use in parallelizing common data mining tasks such as tree learning. For many data mining applications, MapReduce may offer scalability as well as ease of deployment in a production setting (for reasons explained later).

We initially give an overview of MapReduce and outline its application in a classic clustering algorithm, k-means. Subsequently, we focus on PLANET: a scalable distributed framework for learning tree models over large datasets. PLANET defines tree learning as a series of distributed computations and implements each one using the MapReduce model. We show how this framework supports scalable construction of classification and regression trees, as well as ensembles of such models. We discuss the benefits and challenges of using a MapReduce compute cluster for tree learning and demonstrate the scalability of this approach by applying it to a real-world learning task from the domain of computational advertising.

MapReduce is a simple model for distributed computing that abstracts away many of the difficulties in parallelizing data management operations across a cluster of commodity machines.

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 23 - 48

DOI: https://doi.org/10.1017/CBO9781139042918.003 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alsabti, K., Ranka, S., and Singh, V. 1998. CLOUDS: A Decision Tree Classier for Large Datasets. Technical Reports, University of Florida.Google Scholar

Ben-Haim, Y., and Yom-Tov, E. 2008. A Streaming Parallel Decision Tree Algorithm. In: Large Scale Learning Challenge Workshop at the International Conference on Machine Learning (ICML).Google Scholar

Bradford, J. P., Fortes, J. A. B., and Bradford, J. 1999. Characterization and Parallelization of Decision Tree Induction. Technical Report, Purdue University.Google Scholar

Breiman, L. 1996. Bagging Predictors. Machine Learning Journal, 24(2), 123–140.CrossRef Google Scholar

Breiman, L. 2001. Random Forests. Machine Learning Journal, 45(1), 5–32.CrossRef Google Scholar

Breiman, L., Friedman, J. H., Olshen, R., and Stone, C. 1984. Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks.Google Scholar

Caragea, D., Silvescu, A., and Honavar, V. 2004. A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees. International Journal of Hybrid Intelligent Systems, 1(1–2), 80–89.CrossRef Google Scholar PubMed

Caruana, R., and Niculescu-Mizil, A. 2006. An Empirical Comparison of Supervised Learning Algorithms. Pages 161–168 of: International Conference on Machine Learning (ICML).Google Scholar

Caruana, R., Karampatziakis, N., and Yessenalina, A. 2008. An Empirical Evaluation of Supervised Learning in High Dimensions. Pages 96–103 of: International Conference on Machine Learning (ICML).Google Scholar

Chan, P. K., and Stolfo, S. J. 1993. Toward Parallel and Distributed Learning byMeta-learning. Pages 227–240 of: Workshop on Knowledge Discovery in Databases at the Conference of Association for the Advancement of Artificial Intelligence (AAAI).Google Scholar

Chu, C.-T., Kim, S. K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A. Y., and Olukotun, K. 2007. Map-Reduce for Machine Learning on Multicore. Pages 281–288 of: Advances in Neural Information Processing Systems (NIPS) 19.Google Scholar

Dean, J., and Ghemawat, S. 2004. MapReduce: Simplified Data Processing on Large Clusters. In: Symposium on Operating System Design and Implementation (OSDI).Google Scholar

Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern Classification, 2nd ed. New York: Wiley.Google Scholar

Freund, Y., and Schapire, R. E. 1996. Experiments with a New Boosting Algorithm. Pages 148–156 of: International Conference on Machine Learning (ICML).Google Scholar

Friedman, J. H. 2001. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics, 29(5), 1189–1232.CrossRef Google Scholar

Gao, J., Wu, Q., Burges, C., Svore, K., Su, Y., Khan, N., Shah, S., and Zhou, H. 2009 (August). Model Adaptation via Model Interpolation and Boosting forWeb Search Ranking. Pages 505–513 of: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.

Gehrke, J., Ramakrishnan, R., and Ganti, V. 1998. RainForest – A Framework for Fast Decision Tree Construction of Large Datasets. Pages 416–427 of: International Conference on Very Large Data Bases (VLDB).Google Scholar

Gehrke, J., Ganti, V., Ramakrishnan, R., and Loh, W.-Y. 1999. BOAT – Optimistic Decision Tree Construction. Pages 169–180 of: International Conference on ACM Special Interest Group on Management of Data (SIGMOD).Google Scholar

Giannella, C., Liu, K., Olsen, T., and Kargupta, H. 2004. Communication Efficient Construction of Decision Trees over Heterogeneously Distributed Data. Pages 67–74 of: International Conference on Data Mining (ICDM).CrossRef Google Scholar

Jin, R., and Agrawal, G. 2003a. Efficient Decision Tree Construction on Streaming Data. Pages 571–576 of: SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).Google Scholar

Jin, R., and Agrawal, G. 2003b. Communication and Memory Efficient Parallel Decision Tree Construction. Pages 119–129 of: SIAM Conference on Data Mining (SDM).Google Scholar

Joshi, M. V., Karypis, G., and Kumar, V. 1998. ScalParC: A New Scalable and Efficient Parallel Classification Algorithm for Mining Large Datasets. Pages 573–579 of: International Parallel Processing Symposium (IPPS).Google Scholar

Kaushik, A. 2007a (August). Bounce Rate as Sexiest Web Metric Ever. MarketingProfs. http://www.marketingprofs.com/7/bounce-rate-sexiest-web-metric-ever-kaushik.asp?sp=1.

Kaushik, A. 2007b (May). Excellent Analytics Tip 11: Measure Effectiveness of Your Web Pages. Occam's Razor (blog). www.kaushik.net/avinash/2007/05/excellent-analytics-tip-11-measureeffectiveness-of-your-web-pages.html.

Lazarevic, A. 2001. The Distributed Boosting Algorithm. Pages 311–316 of: SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).Google Scholar

MacQueen, J. B. 1967. Some Methods for Classification and Analysis of Multivariate Observations. Pages 281–297 of: Cam, L. M. Le, and Neyman, J. (eds), Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. Berkeley: University of California Press.Google Scholar

Manku, G. S., Rajagopalan, S., and Lindsay, B. G. 1999. Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. Pages 251–262 of: International Conference on ACM Special Interest Group on Management of Data (SIGMOD).Google Scholar

Mehta, M., Agrawal, R., and Rissanen, J. 1996. SLIQ: A Fast Scalable Classifier for Data Mining. Pages 18–32 of: International Conference on Extending Data Base Technology (EDBT).Google Scholar

Provost, F., and Fayyad, U. 1999. A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery, 3, 131–169.CrossRef Google Scholar

Ridgeway, G. 2006. Generalized Boosted Models: A Guide to the GBM Package. http://cran.rproject.org/web/packages/gbm.

Rokach, L., and Maimon, O. 2008. Data Mining with Decision Trees: Theory and Applications. World Scientific.Google Scholar

Sculley, D., Malkin, R., Basu, S., and Bayardo, R. J. 2009. Predicting Bounce Rates in Sponsored Search Advertisements. Pages 1325–1334 of: SIGKDD Conference on Knowledge Discovery and Data Mining (KDD).Google Scholar

Shafer, J. C., Agrawal, R., and Mehta, M. 1996. SPRINT: A Scalable Parallel Classifier for Data Mining. Pages 544–555 of: International Conference on Very Large Data Bases (VLDB).Google Scholar

Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. Berlin: Springer.CrossRef Google Scholar

Zaki, M. J., Ho, C.-T., and Agrawal, R. 1999. Parallel Classification for Data Mining on Shared- Memory Multiprocessors. Pages 198–205 of: International Conference on Data Engineering (ICDE).Google Scholar

Book contents

2 - MapReduce and Its Application to Massively Parallel Learning of Decision Tree Ensembles

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive