Uniformly Fine-Grained Data-Parallel Computing for Machine Learning Algorithms

Meichun Hsu; Ren Wu; Bin Zhang

doi:10.1017/CBO9781139042918.006

5 - Uniformly Fine-Grained Data-Parallel Computing for Machine Learning Algorithms

from Part One - Frameworks for Scaling Up Machine Learning

Published online by Cambridge University Press: 05 February 2012

Ren Wu and

Edited by

Mikhail Bilenko and

Meichun Hsu: Affiliation:
HP Labs, Palo Alto, CA, USA
Ren Wu: Affiliation:
HP Labs, Palo Alto, CA, USA
Bin Zhang: Affiliation:
HP Labs, Palo Alto, CA, USA
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

The graphics processing unit (GPU) of modern computers has evolved into a powerful, general-purpose, massively parallel numerical (co-)processor. The numerical computation in a number of machine learning algorithms fits well on the GPU. To help identify such algorithms, we present uniformly fine-grained data-parallel computing and illustrate it on two machine learning algorithms, clustering and regression clustering, on a GPU and central processing unit (CPU) mixed computing architecture. We discuss the key issues involved in a successful design of the algorithms, data structures, and computation partitioning between a CPU and a GPU. Performance gains on a CPU and GPU mixed architecture are compared with the performance of the regression clustering algorithm implemented completely on a CPU. Significant speedups are reported. A GPU and CPU mixed architecture also achieves better cost-performance and energy-performance ratios.

The computing power of the CPU has increased dramatically in the past few decades, supported by both miniaturization and increasing clock frequencies. More and more electronic gates were packed onto the same area of a silicon die as miniaturization continued. Hardware-supported parallel computing, pipelining for example, further increased the computing power of CPUs. Frequency increases speeded up CPUs even more directly. However, the long-predicted physical limit of the miniaturization process was finally hit a few years ago such that increasing the frequency was no longer feasible due to the accompanied nonlinear increase in power consumption, even though miniaturization still continues.

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 89 - 106

DOI: https://doi.org/10.1017/CBO9781139042918.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

AccelerEyes, . 2010. Jacket GPU Software for Matlab. www.accelereyes.com.

Allison, P. D. 1999. Multiple Regression: A Primer.Thousand Oaks, CA: SAGE.Google Scholar

Bell, N., and Garland, M. 2008. Efficient Sparse Matrix-Vector Multiplication on CUDA.http://www.nvidia.com/object/nvidia research pub 001.html.

Bradley, P., and Fayyad, U. M. 1998. Refining Initial Points for KM Clustering. Technical Report MSR-TR-98-36.Google Scholar

Che, S. 2007. A Performance Study of General Purpose Application on Graphics Processors. Workshop on GPGPU, Boston.Google Scholar

Che, S. 2008. A Performance Study of General-Purpose Application on Graphics Processors Using CUDA. Journal of Parallel and Distributed Computing.CrossRef Google Scholar

DeSarbo, W. S., and Corn, L. W. 1988. A Maximum Likelihood Methodology for Cluterwise Linear Regression. Journal of Classification.CrossRef Google Scholar

Friedman, J., Hastie, T., and Tibshirani, R. 1998. Additive Logistic Regression: A Statistical View of Boosting. Technical Report, Department of Statistics, Sequoia Hall, Stanford Univerity.Google Scholar

Hennig, C. 1997. Datenanalyse mit Modellen Fur Cluster Linear Regression. Dissertation, Institut Fur Mathmatsche Stochastik, Universitat Hamburg.Google Scholar

KhronosGroup. 2010. OpenCL – The Open Standard for Parallel Programming of Heterogeneous Systems.http://www.khronos.org/opencl/.

Montgomery, D. C., Peck, E. A., and Vining, G. G. 2001. Introduction to Linear Regression Analysis, 3rd Edition. New York: Wiley.Google Scholar

NVIDIA. 2008. CUDA Occupancy Calculator.http://news.developer.nvidia.com/2007/03/cudaoccupancy.html.

Pena, J., Lozano, J., and Larranaga, P. 1999. An Empirical Comparison of Four InitializationMethods for the K-means Algorithm. Pattern Recognition Letters.Google Scholar

Pisharath, J. 2005. NU-MineBench 2.0. Technical Report CUCIS-2005-08-01, Northwestern University.Google Scholar

Spath, H. 1981. Correction to Algorithm 39: Clusterwise Linear Regression. Computing.CrossRef Google Scholar

Spath, H. 1982. Algorithm 48: A Fast Algorithm for Clusterwise Linear Regression. Computing.CrossRef Google Scholar

Spath, H. 1985. Cluster Dissection and Analysis.New York: Wiley.Google Scholar

Wright, R. S., Haemel, N., Sellers, G., and Lipchak, B. 2010. OpenGL SuperBible: Comprehensive Tutorial and Reference, Edwards Brothers.

Wu, R., Zhang, B., and Hsu, M. 2009a. Clustering Billions of Data Points Using GPUs. ACM UCHPC09: Second Workshop on UnConventional High Performance Computing.Google Scholar

Wu, R., Zhang, B., and Hsu, M. 2009 b. GPU-Accelerated Large Scale Analytics. HP Labs Technical Report, HPL-2009-38. http://www.hpl.hp.com/techreports/2009/HPL-2009-38.html.

Zhang, B. 2003. Regression clustering. ICDM.Google Scholar

Zhang, B. 2005. Center-based Clustering and Regression Clustering. Encyclopedia of Data Warehousing and Mining.CrossRef Google Scholar

Zhang, B., Hsu, M., and Forman, G. 2000. Accurate Recasting of Parameter Estimation Algorithms Using Sufficient Statistics for Efficient Parallel Speed-up: Demonstrated for Center-based Data Clustering Algorithms. Pages 243–254 of: Proceedings of PAKDD.Google Scholar

Book contents

5 - Uniformly Fine-Grained Data-Parallel Computing for Machine Learning Algorithms

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive