Skip to main content Accessibility help
×
Home

PVFMM: A Parallel Kernel Independent FMM for Particle and Volume Potentials

  • Dhairya Malhotra (a1) and George Biros (a1)

Abstract

We describe our implementation of a parallel fast multipole method for evaluating potentials for discrete and continuous source distributions. The first requires summation over the source points and the second requiring integration over a continuous source density. Both problems require (N2) complexity when computed directly; however, can be accelerated to (N) time using FMM. In our PVFMM software library, we use kernel independent FMM and this allows us to compute potentials for a wide range of elliptic kernels. Our method is high order, adaptive and scalable. In this paper, we discuss several algorithmic improvements and performance optimizations including cache locality, vectorization, shared memory parallelism and use of coprocessors. Our distributed memory implementation uses space-filling curve for partitioning data and a hypercube communication scheme. We present convergence results for Laplace, Stokes and Helmholtz (low wavenumber) kernels for both particle and volume FMM. We measure efficiency of our method in terms of CPU cycles per unknown for different accuracies and different kernels. We also demonstrate scalability of our implementation up to several thousand processor cores on the Stampede platform at the Texas Advanced Computing Center.

Copyright

Corresponding author

*Corresponding author. Email addresses: dhairya.malhotra@gmail.com (D. Malhotra), gbiros@acm.org (G. Biros)

References

Hide All
[1]Chandramowlishwaran, Aparna, Madduri, Kamesh, and Vuduc, Richard. Diagnosis, tuning, and redesign for multicore performance: A case study of the fast multipole method. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 112. IEEE Computer Society, 2010.
[2]Cheng, H., Greengard, L., and Rokhlin, V.. A fast adaptive multipole algorithm in three dimensions. Journal of Computational Physics, 155(2):468498, 1999.
[3]Engquist, Björn, Ying, Lexing, et al. A fast directional algorithm for high frequency acoustic scattering in two dimensions. Communications in Mathematical Sciences, 7(2):327345, 2009.
[4]Ethridge, Frank and Greengard, Leslie. A new fast-multipole accelerated poisson solver in two dimensions. SIAM Journal on Scientific Computing, 23(3):741760, 2001.
[5]Fong, William and Darve, Eric. The black-box fast multipole method. Journal of Computational Physics, 228(23):87128725, 2009.
[6]Fu, Yuhong, Klimkowski, Kenneth J., Rodiny, Gregory J., Berger, Emery, Browne, James C., C, James, Singer, Jrgen K., Van De Geijn, Robert A., and Vemaganti, Kumar S.. A fast solution method for three-dimensional many-particle problems of linear elasticity. Int. J. Num. Meth. Engrg, 42:12151229, 1998.
[7]Fu, Yuhong and Rodin, Gregory J. Fast solution method for three-dimensional stokesian many-particle problems. Communications in Numerical Methods in Engineering, 16(2):145149, 2000.
[8]Gimbutas, Zydrunas and Rokhlin, Vladimir. A generalized fast multipole method for nonoscillatory kernels. SIAM Journal on Scientific Computing, 24(3):796817, 2003.
[9]Greengard, L. and Rokhlin, V.. A fast algorithm for particle simulations. J. Comput. Phys., 73(2):325348, December 1987.
[10]Greengard, Leslie. Fast algorithms for classical physics. Science, 265(5174):909914, 1994.
[11]Greengard, Leslie F. and Huang, Jingfang. A new version of the fast multipole method for screened coulomb interactions in three dimensions. Journal of Computational Physics, 180(2):642658, 2002.
[12]Hamada, T., Narumi, T., Yokota, R., Yasuoka, K., Nitadori, K., and Taiji, M.. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In Proceedings of SC09, The SCxy Conference series, Portland, Oregon, November 2009. ACM/IEEE.
[13]Hu, Qi, Gumerov, Nail A, and Duraiswami, Ramani. Scalable fast multipole methods on distributed heterogeneous architectures. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, page 36. ACM, 2011.
[14]Jetley, Pritish, Wesolowski, Lukasz, Gioachin, Filippo, Kaleé, Laxmikant V, and Quinn, Thomas R. Scaling hierarchical n-body simulations on gpu clusters. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pages 111. IEEE Computer Society, 2010.
[15]Langston, Harper, Greengard, Leslie, and Zorin, Denis. A free-space adaptive fmm-based pde solver in three dimensions. Communications in Applied Mathematics and Computational Science, 6(1):79122, 2011.
[16]Lashuk, Ilya, Chandramowlishwaran, Aparna, Langston, Harper, Nguyen, Tuan-Anh, Sampath, Rahul, Shringarpure, Aashay, Vuduc, Richard, Ying, Lexing, Zorin, Denis, and Biros, George. A massively parallel adaptive fast multipole method on heterogeneous architectures. Communications of the ACM, 55(5):101109, May 2012.
[17]Lindsay, Keith and Krasny, Robert. A particle method and adaptive treecode for vortex sheet motion in three-dimensional flow. Journal of Computational Physics, 172(2):879907, 2001.
[18]Makino, Junichiro, Fukushige, Toshiyuki, and Koga, Masaki. A 1.349 tflops simulation of black holes in a galactic center on grape-6. In Supercomputing, ACM/IEEE 2000 Conference, pages 4343. IEEE, 2000.
[19]Malhotra, Dhairya and Biros, George. pvfmm: A distributed memory fast multipole method for volume potentials, 2014. submitted.
[20]Malhotra, Dhairya, Gholami, Amir, and Biros, George. A volume integral equation stokes solver for problems with variable coefficients. In High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for, pages 92102. IEEE, 2014.
[21]Rahimian, A., Lashuk, I., Veerapaneni, S.K., Chandramowlishwaran, A., Malhotra, D., Moon, L., Sampath, R., Shringarpure, A., Vetter, J., Vuduc, R., Zorin, D., and Biros, G.. Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures. In SC ’10: Proceedings of the 2010 ACM/IEEE conference on Supercomputing, pages 112, Piscataway, NJ, USA, 2010. IEEE Press.
[22]Song, Jiming, Lu, Cai-Cheng, and Chew, Weng Cho. Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. Antennas and Propagation, IEEE Transactions on, 45(10):14881493, 1997.
[23]Takahashi, Toru, Cecka, Cris, Fong, William, and Darve, Eric. Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units. International Journal for Numerical Methods in Engineering, 89(1):105133, 2012.
[24]Warren, Michael S and Salmon, John K. Astrophysical n-body simulations using hierarchical tree data structures. In Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pages 570576. IEEE Computer Society Press, 1992.
[25]Warren, Michael S and Salmon, John K. A parallel hashed oct-tree n-body algorithm. In Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pages 1221. ACM, 1993.
[26]Ying, Lexing, Biros, George, and Zorin, Denis. A kernel-independent adaptive fast multipole method in two and three dimensions. Journal of Computational Physics, 196(2):591626, 2004.
[27]Ying, Lexing, Biros, George, Zorin, Denis, and Langston, Harper. A new parallel kernel-independent fast multipole method. In Supercomputing, 2003 ACM/IEEE Conference, pages 1414. IEEE, 2003.
[28]Yokota, R., Bardhan, J.P., Knepley, M.G., Barba, LA, and Hamada, T.. Biomolecular electrostatics using a fast multipole bem on up to 512 gpus and a billion unknowns. Computer Physics Communications, 2011.

Keywords

PVFMM: A Parallel Kernel Independent FMM for Particle and Volume Potentials

  • Dhairya Malhotra (a1) and George Biros (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed