Hostname: page-component-84b7d79bbc-5lx2p Total loading time: 0 Render date: 2024-07-25T13:42:32.741Z Has data issue: false hasContentIssue false

Numerical Study of Geometric Multigrid Methods on CPU-GPU Heterogeneous Computers

Published online by Cambridge University Press:  03 June 2015

Chunsheng Feng*
Affiliation:
Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, China
Shi Shu*
Affiliation:
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
Jinchao Xu*
Affiliation:
Department of Mathematics, Pennsylvania State University, PA, USA
Chen-Song Zhang*
Affiliation:
NCMIS and LSEC, Academy of Mathematics and System Sciences, Chinese Academy of Sciences, Beijing 100190, China
*
Corresponding author. Email: shushi@xtu.edu.cn
Get access

Abstract

The geometric multigrid method (GMG) is one of the most efficient solving techniques for discrete algebraic systems arising from elliptic partial differential equations. GMG utilizes a hierarchy of grids or discretizations and reduces the error at a number of frequencies simultaneously. Graphics processing units (GPUs) have recently burst onto the scientific computing scene as a technology that has yielded substantial performance and energy-efficiency improvements. A central challenge in implementing GMG on GPUs, though, is that computational work on coarse levels cannot fully utilize the capacity of a GPU. In this work, we perform numerical studies of GMG on CPU-GPU heterogeneous computers. Furthermore, we compare our implementation with an efficient CPU implementation of GMG and with the most popular fast Poisson solver, Fast Fourier Transform, in the cuFFT library developed by NVIDIA.

Type
Research Article
Copyright
Copyright © Global-Science Press 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Asanovic, K., Bodik, R., Catanzaro, B. C., Gebis, J. J., Husbands, P., Keutzer, K., Patterson, D. A., Plishker, W. L., Shalf, J., Williams, S. W., and Yelick, K. A., The Landscape of Parallel Computing Research: A View from Berkeley, Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, 2006.Google Scholar
[2]Bakos, J. D., High-performance heterogeneous computing with the convey HC-1, Comput. Sci. Eng., 12(6) (2010), pp. 8087.Google Scholar
[3]Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ort I, E. S., and Quintana-Ort I, G., Exploiting the capabilities of modern GPUs for dense matrix computations, Concurrency Comput. Practice Experience, 21(18) (2009), pp. 24572477.CrossRefGoogle Scholar
[4]Bell, N., Dalton, S., and Olson, L. N., Exposing fine-grained parallelism in algebraic multigrid methods, Technical report, NVIDIA Technical Report NVR-2011-002,2011.Google Scholar
[5]Bell, N. and Garland, M., Efficient sparse matrix-vector multiplication on CUDA, Memory, (NVR-2008-004), pp. 132, 2008.Google Scholar
[6]Bell, N. and Garland, M., Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking Storage and Analysis SC 09, (1) (2009), pp. 1.Google Scholar
[7]Bjø Rstad, P. E., Dryja, M., and Rahman, T., Additive Schwarz methods for elliptic mortar finite element problems, Numer. Math., 2003(2) (2003), pp. 427457.Google Scholar
[8]Bjø Rstad, P. E., Manne, F., SøRevik, T., and Vajtersic, M., Efficient matrix multiplication on SIMD computers, SIAM J. Matrix Ana l. Appl., 13(1) (1992), pp. 386401.Google Scholar
[9]Bolz, J., Farmer, I., and Grinspun, E., Sparse matrix solvers on the GPU: conjugate gradients and multigrid, ACM Trans. Graphics, 22 (2003), pp. 917924.Google Scholar
[10]Bramble, J. H., Multigrid methods, Chapman & Hall/CRC, 1993.Google Scholar
[11]Brandt, A., Algebraic multigrid theory: The symmetric case, Appl. Math. Comput., 19(1-4) (1986), pp. 2356.Google Scholar
[12]Brandt, A., Multigrid guide, Technical report, 2011.Google Scholar
[13]Brandt, A., Mccormick, S., and Ruge, J., Algebraic multigrid (AMG)for automatic multigrid solution with application to geodetic computations, Report, Inst. Comput. Studies Colorado State Univ, 109 (1982), pp. 110.Google Scholar
[14]Briggs, W. L., Henson, V. E., and Mccormick, S. F., A Multigrid Tutorial, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 2000.Google Scholar
[15]Brodtkorb, A. R., Dyken, C., Hagen, T. R., and Hjelmervik, J. M., State-of-the-art in heterogeneous computing, Sci. Program, 18 (2010), pp. 133.Google Scholar
[16]Buck, I., GPU computing: programming a massively parallel processor, International Symposium on Code Generation and Optimization (CGO’07), (2007), pp. 17.CrossRefGoogle Scholar
[17]Cao, W., Yao, L., Li, Z., Wang, Y., and Wang, Z., Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format, In Computer Application and System Modeling ICCASM 2010 International Conference on, 2010(11), pp. 161, IEEE, 2010.Google Scholar
[18]Carpenter, P. and Symon, W., Issues in heterogenenous GPU clusters a historical and usage analysis, Technical report, 2009.Google Scholar
[19]Chamberlain, R. D., Franklin, M. A., Tyson, E. J., Buhler, J., Gayen, S., Crowley, P., and Buckley, J. H., Application development on hybrid systems, In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07,50, pp. 110, New York, NY, USA, 2007, ACM.Google Scholar
[20]Chen, G., Guobo, L., Pei, S., and Wu, B., High performance computing via a GPU, In International Conference on Information Science and Engineering, pp. 238241, 2009.Google Scholar
[21]Cooley, J. W. and Tukey, J. W., An algorithm for the machine calculation complex fourier series, Math. Comput., 19 (1965), pp. 297301.Google Scholar
[22]Elble, J. M., Sahinidis, N. V., and Vouzis, P., GPU computing with Kaczmarz’s and other iterative algorithms for linear systems, Parallel Comput., 36(5-6) (2010), pp. 215231.Google Scholar
[23]Frigo, M. and Johnson, S. G., The design and implementation ofFFTW3, Proc. IEEE, 93(2) (2005), pp. 216231.Google Scholar
[24]Georgescu, S. AND Okuda, H., Conjugate gradients on multiple GPUs, (2010), pp. 12541273.Google Scholar
[25] Green500, Green500 List, available at , 2012.Google Scholar
[26]Griebel, M., Zur Losung von Finite-Differenzenund Finite-Element-Gleichungen Mittels der Hiearchischen-Transformations-Mehrgitter-Methode, PhD thesis, Technische Universitat Munchen, 1989.Google Scholar
[27]Guo, D. AND Gropp, W., Adaptive Threads Distributions for SpMV on GPU. In XSEDE12 Extreme Scaling Workshop, 2012.Google Scholar
[28]Hackbusch, W., Multi-Grid Methods and Applications, Springer Verlag, 1985.Google Scholar
[29]Heuveline, V., Lukarski, D., Trost, N., and Weiss, J.-P., Parallel smoothers for matrix-based multigrid methods on unstructured meshes using multicore CPUs and GPUs, Technical report, 2011.Google Scholar
[30]Heuveline, V., Lukarski, D., and Weiss, J.-P., Enhanced parallel ILU (p)-based preconditioners for multi-core CPUs and GPUs-the power (q)-pattern method, Technical report, 2011.Google Scholar
[31]Hey, T., Tansley, S., and Tolle, K., The fourth paradigm: data-intensive scientific discovery, Microsoft Research, 2009.Google Scholar
[32]Jeschke, S. and Cline, D., A GPU Laplacian solver for diffusion curves and Poisson image editing, ACM Trans. Graphics (TOG), 28(5) (2009).Google Scholar
[33]Kaushik, D., Keyes, D., Balay, S., and Smith, B., Hybrid programming model for implicit PDE simulations on multicore architectures, Proceedings of the International Workshop on OpenMP (IWOMP), pp. 1221, 2011.Google Scholar
[34]Keyes, D. E., Exaflop/s: the why and the how, Comptes Rendus Mecanique, 339(2-3) (2011), pp. 7077.Google Scholar
[35]Knibbe, H., Oosterlee, C. W., and Vuik, C., GPU implementation of a Helmholtz Krylov solver preconditioned by a shifted Laplace multigrid method, J. Comput. Appl. Math., 236 (2011), pp. 281293.Google Scholar
[36]Kostler, H., Schmid, R., Rüde, U., and Scheit, C., A parallel multigrid accelerated Poisson solver for ab initio molecular dynamics applications, Comput. Visual. Sci., 11(2) (2007), pp. 11522.Google Scholar
[37]Lord, R., Fang, f., Bervoets, f., and Oosterlee, C. W., A fast and accurate FFT-based method for pricing early-exercise options under Levy processes, SIAM J. Sci. Comput., 30(4) (2008), pp. 16781705.Google Scholar
[38] MAGMA, Matrix Algebra on GPU and Multicore Architectures, available at , 2012.Google Scholar
[39]Morton, K. W. AND Mayers, D. F., Numerical Solution of Partial Differential Equations, Cambridge University Press, Cambridge, second edition, 2005.Google Scholar
[40]Nickolls, J. AND Dally, W. J., The GPU computing era, Micro IEEE, 30(2) (2010), pp. 5669.Google Scholar
[41] NVIDIA, CUDA 4.1, available at , 2012.Google Scholar
[42] NVIDIA, cuFFT, available at , 2012.Google Scholar
[43]Ruge, J. W. AND Stüben, K., Algebraic multigrid, Multigrid Methods, 3 (1987), pp. 73130.Google Scholar
[44]Shi, J., Cai, Y., Hou, W., Ma, L., Tan, S. X.-D., Ho, P.-H., AND Wang, X., GPU friendly fast Poisson solver for structured power grid network analysis, In Proceedings of the 46th Annual Design Automation Conference-DAC ’09, pp. 178, New York, New York, USA, 2009, ACM Press.CrossRefGoogle Scholar
[45]Stürmer, M., Kostler, H., and Rüde, U., A fast full multigrid solver for applications in image processing, Numer. Linear Algebra Appl., 15 (2008), pp. 187200.Google Scholar
[46] H. P. C. TOP500, HPC Top500, available at , 2012.Google Scholar
[47]Trottenberg, U., Oosterlee, C. W., and Schüller, A., Multigrid, Academic Pr, 2001.Google Scholar
[48]Walker, J. S., Fast Fourier transforms, Studies in Advanced Mathematics, CRC Press, Boca Raton, FL, second edition, 1996.Google Scholar
[49]Weiss, C., Data Locality Optimizations for Multigrid Methods on Structured Grids, PhD thesis, 2001.Google Scholar
[50]Wolfe, M., The Heterogeneous Programming Jungle, HPC Wire, 2012.Google Scholar
[51]Xu, J., Fast Poisson-based solvers for linear and nonlinear PDEs Jinchao Xu, In Proceedings Of The International Congress Of Mathematicians 2010, Number 2000, pp. 28862912, 2010.Google Scholar
[52]Yang, J., Cai, Y., and Zhou, Q., Fast Poisson Solver preconditioned method for robust power grid analysis, In Computer-Aided Design (ICCAD), 2011 IEEE/ACM International Conference on, pp. 531536, 2011.Google Scholar