Skip to main content Accessibility help

GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM

  • A. Dziekonski (a1), M. Rewienski (a1), P. Sypek (a1), A. Lamecki (a1) and M. Mrozowski (a1)...


This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higher-order FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from the Intel MKL on the Intel Xeon (E5-2680 v3, 12 threads) central processing unit (CPU) executed in parallel mode. Compared to the CPU reference implementation based on the Intel MKL functions, the proposed GPU-based LOBPCG method with inexact nullspace filtering allowed us to achieve up to 2.9-fold acceleration.


Corresponding author

*Corresponding author. Email addresses: (A. Dziekonski), (M. Rewienski), (P. Sypek), (A. Lamecki), (M. Mrozowski)


Hide All
[1] Ingelström, P., A new set of H (curl)-conforming hierarchical basis functions for tetrahedral meshes, Microwave Theory and Techniques, IEEE Transactions on, 54 (1) (2006), 106–114.
[2] Zhu, Y., and Cangellaris, A., Nested multigrid vector and scalar potential finite element method for three-dimensional time-harmonic electromagnetic analysis, Radio Science, 37 (3) (2002), 8:1–8:10.
[3] Chen, Y., Feng, J., Generalized eigenvalue analysis of symmetric prestressed structures using group theory, J. Comput. Civ. Eng., 10, (2012), 488497.
[4] Absil, P. -A., Baker, C. G., and Gallivan, K. A., A truncated-CG style method for symmetric generalized eigenvalue problems, J. Comput. Appl. Math. 189, (2006), 274285.
[5] Sorensen, D. C., Implicitly Restarted Arnoldi/Lanczos Methods for Large Scale Eigenvalue Calculations, Springer Netherlands, 1997.
[6] Knyazev, A. V., Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method, SIAM Journal on Scientific Computing 23 (2), (2001), 517541.
[7] Demmel, J., Dongarra, J., Ruhe, A. and van der Vorst, H., Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Bai, Zhaojun (Ed.). Soc. for Industrial and Applied Math., Philadelphia, PA, USA, 2000.
[8] Arbenz, P., Bečka, M., Geus, R., Hetmaniuk, U., T., and Mengotti, , On a parallel multilevel preconditioned Maxwell eigensolver, Parallel Computing, 32 (2), (2006), 157165.
[9] Romero, E., Roman, J. E., A parallel implementation of Davidson methods for large-scale eigenvalue problems in SLEPc, ACM Transactions on Mathematical Software (TOMS) 40 (2) (2014), 13:1–13:29.
[10] Knyazev, A. V., Argentati, M. E., Lashuk, I., Ovtchinnikov, E. E., Block locally optimal preconditioned eigenvalue xolvers (BLOPEX) in hypre and PETSc, SIAM Journal on Scientific Computing, 29 (5), (2007), 22242239.
[11] Langr, D., Tvrdik, P., Evaluation criteria for sparse matrix storage formats, IEEE Transactions on Parallel and Distributed Systems, 27 (2), (2016), 428440.
[12] Anzt, H., Tomov, S., Luszczek, P., Sawyer, W., Dongarra, J., Acceleration of GPU-based Krylov solvers via data transfer reduction, International Journal of High Performance Computing Applications, 29 (3), (2015), 366383.
[13] Zhang, S., Li, T., Jiao, X., Wang, Y., Yifeng, Y., HLanc: Heterogeneous parallel implementation of the implicitly restarted Lanczos method, 43rd International Conference on Parallel Processing Workshops, IEEE, (2014) 403410.
[14] Anzt, H., Tomov, S., and Dongarra, J., Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product, In Proceedings of the Symposium on High Performance Computing (HPC ′15), Society for Computer Simulation International, San Diego, CA, USA, (2015) 7582.
[15] Matrix Algebra on GPU and Multicore Architectures (MAGMA),
[16] Rewienski, M., Lamecki, A. and Mrozowski, M., An extended basis inexact shift-invert Lanczos for the efficient solution of large-scale generalized eigenproblems, Computer Physics Communications 184 (2013), 21272135.
[17] Zhong, L., Two-grid methods for time-harmonic Maxwell equations, Numerical Linear Algebra with Applications, 20 (1) (2013), 93–111.
[18] Kolev, T. V., Pasciak, J. E. and Vassilevski, P. S., H (curl) auxiliary mesh preconditioning, Numerical Linear Algebra with Applications, 15 (5) (2008), 455–471.
[19] Arbenz, P. and Geus, R., Multilevel preconditioned iterative eigensolvers for Maxwell eigenvalue problems, Applied Numerical Mathematics, 54 (2) (2005), 107–121.
[20] Zhu, Y., Cangellaris, A., Multigrid Finite Element Methods For Electromagnetic Field Modeling, Wiley-Interscience, 2006.
[21] NVIDIA Corporation, CUDA Programming Guide,
[22] Dziekonski, A., Lamecki, A. and Mrozowski, M., GPU acceleration of multilevel solvers for analysis of microwave components with finite element method, Microwave and Wireless Components Letters, IEEE 21 (1) (2011), 1–3.
[23] Dziekonski, A., Lamecki, A. and Mrozowski, M, Tuning a hybrid GPU–CPU V-Cycle multilevel preconditioner for solving large real and complex systems of FEM equations, Antennas and Wireless Propagation Letters, IEEE, 10 (2011), 619622.
[24] Dziekonski, A., Lamecki, A., and Mrozowski, M., A memory-efficient and fast sparse matrix vector product on a GPU, Progress In Electromagnetics Research, 116, (2011), 4963.
[25] Schöberl, J., NETGEN an advancing front 2D/3D-mesh generator based on abstract rules, Computing and Visualization in Science, 1 (1), (1997) 4152.
[26] Lamecki, A., Balewski, L. and Mrozowski, M., An efficient framework for fast computer aided design of microwave circuits based on the higher-order 3D finite-element method, Radio-engineering, 23 (4), (2014), 970978.


MSC classification


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed