Skip to main content Accessibility help
Hostname: page-component-55b6f6c457-b6fb2 Total loading time: 0.228 Render date: 2021-09-27T11:26:21.705Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

Numerical Study of Geometric Multigrid Methods on CPU-GPU Heterogeneous Computers

Published online by Cambridge University Press:  03 June 2015

Chunsheng Feng*
Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, China
Shi Shu*
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
Jinchao Xu*
Department of Mathematics, Pennsylvania State University, PA, USA
Chen-Song Zhang*
NCMIS and LSEC, Academy of Mathematics and System Sciences, Chinese Academy of Sciences, Beijing 100190, China
Corresponding author. Email:
Get access


The geometric multigrid method (GMG) is one of the most efficient solving techniques for discrete algebraic systems arising from elliptic partial differential equations. GMG utilizes a hierarchy of grids or discretizations and reduces the error at a number of frequencies simultaneously. Graphics processing units (GPUs) have recently burst onto the scientific computing scene as a technology that has yielded substantial performance and energy-efficiency improvements. A central challenge in implementing GMG on GPUs, though, is that computational work on coarse levels cannot fully utilize the capacity of a GPU. In this work, we perform numerical studies of GMG on CPU-GPU heterogeneous computers. Furthermore, we compare our implementation with an efficient CPU implementation of GMG and with the most popular fast Poisson solver, Fast Fourier Transform, in the cuFFT library developed by NVIDIA.

Research Article
Copyright © Global-Science Press 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


[1]Asanovic, K., Bodik, R., Catanzaro, B. C., Gebis, J. J., Husbands, P., Keutzer, K., Patterson, D. A., Plishker, W. L., Shalf, J., Williams, S. W., and Yelick, K. A., The Landscape of Parallel Computing Research: A View from Berkeley, Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, 2006.Google Scholar
[2]Bakos, J. D., High-performance heterogeneous computing with the convey HC-1, Comput. Sci. Eng., 12(6) (2010), pp. 8087.CrossRefGoogle Scholar
[3]Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ort I, E. S., and Quintana-Ort I, G., Exploiting the capabilities of modern GPUs for dense matrix computations, Concurrency Comput. Practice Experience, 21(18) (2009), pp. 24572477.CrossRefGoogle Scholar
[4]Bell, N., Dalton, S., and Olson, L. N., Exposing fine-grained parallelism in algebraic multigrid methods, Technical report, NVIDIA Technical Report NVR-2011-002,2011.Google Scholar
[5]Bell, N. and Garland, M., Efficient sparse matrix-vector multiplication on CUDA, Memory, (NVR-2008-004), pp. 132, 2008.Google Scholar
[6]Bell, N. and Garland, M., Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking Storage and Analysis SC 09, (1) (2009), pp. 1.Google Scholar
[7]Bjø Rstad, P. E., Dryja, M., and Rahman, T., Additive Schwarz methods for elliptic mortar finite element problems, Numer. Math., 2003(2) (2003), pp. 427457.CrossRefGoogle Scholar
[8]Bjø Rstad, P. E., Manne, F., SøRevik, T., and Vajtersic, M., Efficient matrix multiplication on SIMD computers, SIAM J. Matrix Ana l. Appl., 13(1) (1992), pp. 386401.CrossRefGoogle Scholar
[9]Bolz, J., Farmer, I., and Grinspun, E., Sparse matrix solvers on the GPU: conjugate gradients and multigrid, ACM Trans. Graphics, 22 (2003), pp. 917924.CrossRefGoogle Scholar
[10]Bramble, J. H., Multigrid methods, Chapman & Hall/CRC, 1993.Google Scholar
[11]Brandt, A., Algebraic multigrid theory: The symmetric case, Appl. Math. Comput., 19(1-4) (1986), pp. 2356.Google Scholar
[12]Brandt, A., Multigrid guide, Technical report, 2011.Google Scholar
[13]Brandt, A., Mccormick, S., and Ruge, J., Algebraic multigrid (AMG)for automatic multigrid solution with application to geodetic computations, Report, Inst. Comput. Studies Colorado State Univ, 109 (1982), pp. 110.Google Scholar
[14]Briggs, W. L., Henson, V. E., and Mccormick, S. F., A Multigrid Tutorial, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 2000.CrossRefGoogle Scholar
[15]Brodtkorb, A. R., Dyken, C., Hagen, T. R., and Hjelmervik, J. M., State-of-the-art in heterogeneous computing, Sci. Program, 18 (2010), pp. 133.Google Scholar
[16]Buck, I., GPU computing: programming a massively parallel processor, International Symposium on Code Generation and Optimization (CGO’07), (2007), pp. 17.CrossRefGoogle Scholar
[17]Cao, W., Yao, L., Li, Z., Wang, Y., and Wang, Z., Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format, In Computer Application and System Modeling ICCASM 2010 International Conference on, 2010(11), pp. 161, IEEE, 2010.Google Scholar
[18]Carpenter, P. and Symon, W., Issues in heterogenenous GPU clusters a historical and usage analysis, Technical report, 2009.Google Scholar
[19]Chamberlain, R. D., Franklin, M. A., Tyson, E. J., Buhler, J., Gayen, S., Crowley, P., and Buckley, J. H., Application development on hybrid systems, In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07,50, pp. 110, New York, NY, USA, 2007, ACM.Google Scholar
[20]Chen, G., Guobo, L., Pei, S., and Wu, B., High performance computing via a GPU, In International Conference on Information Science and Engineering, pp. 238241, 2009.Google Scholar
[21]Cooley, J. W. and Tukey, J. W., An algorithm for the machine calculation complex fourier series, Math. Comput., 19 (1965), pp. 297301.CrossRefGoogle Scholar
[22]Elble, J. M., Sahinidis, N. V., and Vouzis, P., GPU computing with Kaczmarz’s and other iterative algorithms for linear systems, Parallel Comput., 36(5-6) (2010), pp. 215231.CrossRefGoogle ScholarPubMed
[23]Frigo, M. and Johnson, S. G., The design and implementation ofFFTW3, Proc. IEEE, 93(2) (2005), pp. 216231.CrossRefGoogle Scholar
[24]Georgescu, S. AND Okuda, H., Conjugate gradients on multiple GPUs, (2010), pp. 12541273.Google Scholar
[25] Green500, Green500 List, available at , 2012.Google Scholar
[26]Griebel, M., Zur Losung von Finite-Differenzenund Finite-Element-Gleichungen Mittels der Hiearchischen-Transformations-Mehrgitter-Methode, PhD thesis, Technische Universitat Munchen, 1989.Google Scholar
[27]Guo, D. AND Gropp, W., Adaptive Threads Distributions for SpMV on GPU. In XSEDE12 Extreme Scaling Workshop, 2012.Google Scholar
[28]Hackbusch, W., Multi-Grid Methods and Applications, Springer Verlag, 1985.Google Scholar
[29]Heuveline, V., Lukarski, D., Trost, N., and Weiss, J.-P., Parallel smoothers for matrix-based multigrid methods on unstructured meshes using multicore CPUs and GPUs, Technical report, 2011.Google Scholar
[30]Heuveline, V., Lukarski, D., and Weiss, J.-P., Enhanced parallel ILU (p)-based preconditioners for multi-core CPUs and GPUs-the power (q)-pattern method, Technical report, 2011.Google Scholar
[31]Hey, T., Tansley, S., and Tolle, K., The fourth paradigm: data-intensive scientific discovery, Microsoft Research, 2009.Google Scholar
[32]Jeschke, S. and Cline, D., A GPU Laplacian solver for diffusion curves and Poisson image editing, ACM Trans. Graphics (TOG), 28(5) (2009).CrossRefGoogle Scholar
[33]Kaushik, D., Keyes, D., Balay, S., and Smith, B., Hybrid programming model for implicit PDE simulations on multicore architectures, Proceedings of the International Workshop on OpenMP (IWOMP), pp. 1221, 2011.Google Scholar
[34]Keyes, D. E., Exaflop/s: the why and the how, Comptes Rendus Mecanique, 339(2-3) (2011), pp. 7077.CrossRefGoogle Scholar
[35]Knibbe, H., Oosterlee, C. W., and Vuik, C., GPU implementation of a Helmholtz Krylov solver preconditioned by a shifted Laplace multigrid method, J. Comput. Appl. Math., 236 (2011), pp. 281293.CrossRefGoogle Scholar
[36]Kostler, H., Schmid, R., Rüde, U., and Scheit, C., A parallel multigrid accelerated Poisson solver for ab initio molecular dynamics applications, Comput. Visual. Sci., 11(2) (2007), pp. 11522.CrossRefGoogle Scholar
[37]Lord, R., Fang, f., Bervoets, f., and Oosterlee, C. W., A fast and accurate FFT-based method for pricing early-exercise options under Levy processes, SIAM J. Sci. Comput., 30(4) (2008), pp. 16781705.CrossRefGoogle Scholar
[38] MAGMA, Matrix Algebra on GPU and Multicore Architectures, available at , 2012.Google Scholar
[39]Morton, K. W. AND Mayers, D. F., Numerical Solution of Partial Differential Equations, Cambridge University Press, Cambridge, second edition, 2005.CrossRefGoogle Scholar
[40]Nickolls, J. AND Dally, W. J., The GPU computing era, Micro IEEE, 30(2) (2010), pp. 5669.CrossRefGoogle Scholar
[41] NVIDIA, CUDA 4.1, available at , 2012.Google Scholar
[42] NVIDIA, cuFFT, available at , 2012.Google Scholar
[43]Ruge, J. W. AND Stüben, K., Algebraic multigrid, Multigrid Methods, 3 (1987), pp. 73130.CrossRefGoogle Scholar
[44]Shi, J., Cai, Y., Hou, W., Ma, L., Tan, S. X.-D., Ho, P.-H., AND Wang, X., GPU friendly fast Poisson solver for structured power grid network analysis, In Proceedings of the 46th Annual Design Automation Conference-DAC ’09, pp. 178, New York, New York, USA, 2009, ACM Press.Google Scholar
[45]Stürmer, M., Kostler, H., and Rüde, U., A fast full multigrid solver for applications in image processing, Numer. Linear Algebra Appl., 15 (2008), pp. 187200.CrossRefGoogle Scholar
[46] H. P. C. TOP500, HPC Top500, available at , 2012.Google Scholar
[47]Trottenberg, U., Oosterlee, C. W., and Schüller, A., Multigrid, Academic Pr, 2001.Google Scholar
[48]Walker, J. S., Fast Fourier transforms, Studies in Advanced Mathematics, CRC Press, Boca Raton, FL, second edition, 1996.Google Scholar
[49]Weiss, C., Data Locality Optimizations for Multigrid Methods on Structured Grids, PhD thesis, 2001.Google Scholar
[50]Wolfe, M., The Heterogeneous Programming Jungle, HPC Wire, 2012.Google Scholar
[51]Xu, J., Fast Poisson-based solvers for linear and nonlinear PDEs Jinchao Xu, In Proceedings Of The International Congress Of Mathematicians 2010, Number 2000, pp. 28862912, 2010.Google Scholar
[52]Yang, J., Cai, Y., and Zhou, Q., Fast Poisson Solver preconditioned method for robust power grid analysis, In Computer-Aided Design (ICCAD), 2011 IEEE/ACM International Conference on, pp. 531536, 2011.CrossRefGoogle Scholar

Send article to Kindle

To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Numerical Study of Geometric Multigrid Methods on CPU-GPU Heterogeneous Computers
Available formats

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Numerical Study of Geometric Multigrid Methods on CPU-GPU Heterogeneous Computers
Available formats

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Numerical Study of Geometric Multigrid Methods on CPU-GPU Heterogeneous Computers
Available formats

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *