Skip to main content Accessibility help
×
Home

Optimization of the Multishift QR Algorithm with Coprocessors for Non-Hermitian Eigenvalue Problems

  • Takafumi Miyata (a1), Yusaku Yamamoto (a2), Takashi Uneyama (a3), Yoshimasa Nakamura (a4) and Shao-Liang Zhang (a1)...

Abstract

The multishift QR algorithm is efficient for computing all the eigenvalues of a dense, large-scale, non-Hermitian matrix. The major part of this algorithm can be performed by matrix-matrix multiplications and is therefore suitable for modern processors with hierarchical memory. A variant of this algorithm was recently proposed which can execute more computational parts by matrix-matrix multiplications. The algorithm is especially appropriate for recent coprocessors which contain many processor-elements such as the CSX600. However, the performance of the algorithm highly depends on the setting of parameters such as the numbers of shifts and divisions in the algorithm. Optimal settings are different depending on the matrix size and computational environments. In this paper, we construct a performance model to predict a setting of parameters which minimizes the execution time of the algorithm. Experimental results with the CSX600 coprocessor show that our model can be used to find the optimal setting.

Copyright

Corresponding author

Corresponding author. Email: miyata@na.cse.nagoya-u.ac.jp
Corresponding author. Email: yamamoto@cs.kobe-u.ac.jp
Corresponding author. Email: uneyama@scl.kyoto-u.ac.jp
Corresponding author. Email: ynaka@amp.i.kyoto-u.ac.jp
Corresponding author. Email: zhang@na.cse.nagoya-u.ac.jp

References

Hide All
[1]Bai, Z., Day, D., Demmel, J. and Dongarra, J., A test matrix collection for non-Hermitian eigenvalue problems, Univ. Tennessee Comput. Sci. T. R., UT-CS-97-355 (1997).
[2]Bai, Z. and Demmel, J., On a block implementation of Hessenberg QR iteration, Int. J. High Speed Comput., 1 (1989), 97112.
[3]Braman, K., Byers, R. and Mathias, R., The multishift QR algorithm part I: Maintaining well-focused shifts and level 3 performance, SIAM J. Matrix Anal. Appl., 23 (2002), 929947.
[5]Cuenca, J., García, L.-P., Giménez, D., González, J. and Vidal, A., Empirical modeling of parallel linear algebra routines, Lect. Notes Comput. Sci., 3019 (2004), 169174.
[6]Cuenca, J., Giménez, D. and González, J., Architecture of an automatically tuned linear algebra library, Parallel Comput., 30 (2004), 187210.
[7]Dackland, K. and Kågström, B., An hierarchical approach for performance analysis of ScaLAPACK-based routines using the distributed linear algebra machine, Lect. Notes Comput. Sci., 1184 (1996), 186195.
[8]Francis, J. G. F., The QR transformation: A unitary analogue to the LR transformation-part 1, Comput. J., 4 (1961), 265271.
[9]Francis, J. G. F., The QR transformation-part 2, Comput. J., 4 (1962), 332345.
[10]Golub, G. H. and Van Loan, C. F., Matrix Computations, 3rd ed., Johns Hopkins University Press, Baltimore, London, 1996.
[12]Kressner, D., Numerical Methods for General and Structured Eigenvalue Problems, Lect. Notes Comput. Sci. Eng. 46, Springer-Verlag, Berlin, Heidelberg, 2005.
[13]Kublanovskaya, V. N., On some algorithms for the solution of the complete eigenvalue problem, U.S.S.R. Comput. Math. Math. Phys., 3 (1961), 637657.
[14]Miyata, T., Yamamoto, Y. and Zhang, S.-L., Performance modeling of multishift QR algorithms for the parallel solution ofsymmetric tridiagonal eigenvalue problems, Lect. Notes Comput. Sci., 6082 (2010), 401412.
[15]Watkins, D. S., The transmission of shifts and shift blurring in the QR algorithm, Lin. Alg. Appl., 241-243 (1996), 877896.
[16]Yamamoto, Y., Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm, Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region, 2005, 249256.
[17]Yamamoto, Y., Performance modeling and optimal block size selection for the small-bulge multi-shift QR algorithm, Lect. Notes Comput. Sci., 4330 (2006), 451463.
[18]Yamamoto, Y., Miyata, T. and Nakamura, Y., Accelerating the complex Hessenberg QR algorithm with the CSX600 floating-point coprocessor, Proceedings of Parallel and Distributed Computing and Systems, 2007, 204211.

Keywords

Optimization of the Multishift QR Algorithm with Coprocessors for Non-Hermitian Eigenvalue Problems

  • Takafumi Miyata (a1), Yusaku Yamamoto (a2), Takashi Uneyama (a3), Yoshimasa Nakamura (a4) and Shao-Liang Zhang (a1)...

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed