Skip to main content Accessibility help
×
Home

Parallel numerical linear algebra

  • James W. Demmel (a1), Michael T. Heath (a2) and Henk A. van der Vorst (a3)

Abstract

We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of paralled processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illustrate these principles using current architectures and software systems, and by showing how one would implement matrix multiplication. Then, we present direct and iterative algorithms for solving linear systems of equations, linear least squares problems, the symmetric eigenvalue problem, the nonsymmetric eigenvalue problem, and the singular value decomposition. We consider dense, band and sparse matrices.

Copyright

References

Hide All
Aho, A., Hopcroft, J. and Ullman, J. (1974), The Design and Analysis of Computer Algorithms, Addison-Wesley (New York).
Anderson, E. and Dongarra, J. (1990), ‘Evaluating block algorithm variants in LA-PACK’, Computer Science Dept. Technical Report CS–90–103, University of Tennessee, Knoxville, TN (LAPACK Working Note 19).
Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Croz, J. Du, Green-baum, A., Hammarling, S., McKenney, A., Ostrouchov, S. and Sorensen, D. (1992), LAPACK Users’ Guide, Release 1.0, SIAM (Philadelphia).
ANSI/IEEE, New York, (1985), IEEE Standard for Binary Floating Point Arithmetic, Std 754–1985 edition (New York).
Arnoldi, W. E. (1951), ‘The principle of minimized iteration in the solution of the matrix eigenproblem’, Quart. Appl. Math. 9, 1729.
Ashcraft, C., Eisenstat, S.C. and Liu, J.W.-H. (1990), ‘A fan-in algorithm for distributed sparse numerical factorization’, SIAM J. Sci. Statist. Comput. 11, 593599.
Ashcraft, C., Eisenstat, S.C., Liu, J.W.-H. and Sherman, A.H. (1990), ‘A comparison of three column-based distributed sparse factorization schemes’, Technical Report YALEU/DCS/RR-810, Dept of Computer Science, Yale University, New Haven, CT.
Ashcraft, C. and Grimes, R. (1988), ‘On vectorizing incomplete factorizations and SSOR preconditioners’, SIAM J. Sci. Statist. Comput. 9, 122151.
Ashcraft, C., Grimes, R., Lewis, J., Peyton, B. and Simon, H. (1987), ‘Progress in sparse matrix methods for large linear systems on vector supercomputers’, Int. J. Supercomput. Appl. 1(4), 1030.
Auslander, L. and Tsao, A. (1992), ‘On parallelizable eigensolvers’, Adv. Appl. Math. 13, 253261.
Babus˘ka, I. (1972), ‘Numerical stability in problems of linear algebra’, SIAM J. Numer. Anal. 9, 5377.
Bai, Z. and Demmel, J. (1989), ‘On a block implementation of Hessenberg multi-shift QR iteration’, Int. J. High Speed Comput. 1(1), 97112 (also LAPACK Working Note 8).
Bai, Z. and Demmel, J. (1992), ‘Design of a parallel nonsymmetric eigenroutine toolbox’, Computer Science Dept preprint, University of California, Berkeley, CA.
Bai, Z., Hu, D. and Reichel, L. (1991), ‘A Newton basis GMRES implementation’, Technical Report 91–03, University of Kentucky.
Bailey, D. H., Lee, K. and Simon, H. D. (1991), ‘Using Strassen's algorithm to accelerate the solution of linear systems’, J. Supercomput. 4, 97371.
Barlow, J. (1991), ‘Error analysis of update methods for the symmetric eigenvalue problem’, to appear in SIAM J. Math. Anal. Appl. (Tech Report CS-91–21, Computer Science Department, Penn State University.)
Beattie, C. and Fox, D. (1989), ‘Localization criteria and containment for Rayleigh quotient iteration’, SIAM J. Math. Anal. Appl. 10 (1), 8093.
Bell, K., Hatlestad, B., Hansteen, O. and Araldsen, P. (1973), NORSAM—A programming system for the finite element method, User's Manual, Part I, General description. Institute for Structural Analysis, NTH, N-7034 Trondheim, Norway.
Benner, R., Montry, G. and Weigand, G. (1987), ‘Concurrent multifrontal methods: shared memory, cache, and frontwidth issues’, Int. J. Supercomput. Appl. 1(3), 2644.
Berryman, H., Saltz, J., Gropp, W. and Mirchandaney, R. (1989), ‘Krylov methods preconditioned with incompletely factored matrices on the CM-2’, Technical Report 89–54, NASA Langley Research Center, ICASE, Hampton, VA.
Bertsekas, D. and Tsitsiklis, J. (1989), Parallel and Distributed Comptutation: Numerical Methods, Prentice Hall (New York).
Bischof, C. (1989), ‘Computing the singular value decomposition on a distributed system of vector processors’, Parallel Comput. 11 171186.
Bischof, C. and Sun, X. (1992), ‘A divide and conquer method for tridiagonalizing symmetric matrices with repeated eigenvalues’, MCS Report P286–0192, Argonne National Lab.
Bischof, C. and Tang, P. (1991a), ‘Generalized incremental condition estimation’, Computer Science Dept, Technical Report CS-91–132, University of Tennessee, Knoxville, TN (LAPACK Working Note 32).
Bischof, C. and Tang, P. (1991b), ‘Robust incremental condition estimation’, Computer Science Dept. Technical Report CS-91–133, University of Tennessee, Knoxville, TN (LAPACK Working Note 33).
Brent, R. P. (1973), Algorithms for Minimization Without Derivatives, Prentice-Hall (New York).
Brent, R. and Luk, F. (1985), ‘The solution of singular value and symmetric eigenvalue problems on multiprocessor arrays’, SIAM J. Sci. Statist. Comput. 6, 6984.
Brunet, J., Edelman, A. and Mesirov, J. (1990), ‘An optimal hypercube direct n-body solver on the connection machine’, in: Proceedings of Supercomputing ‘90, IEEE Computer Society Press (New York), 748752.
Calvetti, D., Petersen, J. and Reichel, L. (1991), ‘A parallel implementation of the GMRES method’, Technical Report ICM-9110–6, Institute for Computational Mathematics, Kent, OH.
Cannon, L. (1969), ‘A cellular computer to implement the Kalman filter algorithm’, PhD thesis, Montana State University, Bozeman, MN.
Chen, S. C., Kuck, D. J. and Sameh, A. H. (1978), ‘Practical parallel band triangular system solvers’, ACM Trans. Math. Software 4, 270277.
Chronopoulos, A. T. (1991), ‘Towards efficient parallel implementation of the CG method applied to a class of block tridiagonal linear systems’, in: Supercomputing '91, IEEE Computer Society Press (Los Alamitos, CA), 578587.
Chronopoulos, A. T. and Gear, C. W. (1989), ‘s-step iterative methods for symmetric linear systems’, J. Comput. Appl. Math. 25, 153168.
Chronopoulos, A. T. and Kim, S. K. (1990), ‘s-Step Orthomin and GMRES implemented on parallel computers’, Technical Report 90/43R, UMSI, Minneapolis.
Chu, E. (1988a), ‘Orthogonal decomposition of dense and sparse matrices on multiprocessors’, PhD thesis, University of Waterloo.
Chu, M. (1988b), ‘A note on the homotopy method for linear algebraic eigenvalue problems’, Lin. Alg. Appl. 105, 225236.
Chu, M., Li, T.-Y. and Sauer, T. (1988), ‘Homotopy method for general λ-matrix problems’, SIAM J. Math. Anal. Appl. 9 (4), 528536.
Csanky, L. (1977), ‘Fast parallel matrix inversion algorithms’, SIAM J. Comput. 5 618623.
Cuppen, J.J.M. (1981), ‘A divide and conquer method for the symmetric tridiagonal eigenproblem’, Numer. Math. 36, 177195.
Davis, G., Funderlic, R. and Geist, G. (1987), ‘A hypercube implementation of the implicit double shift QR algorithm’, in: Hypercube Multiprocessors 1987, SIAM (Philadelphia), 619626.
D'Azevedo, E. F. and Romine, C. H. (1992), ‘Reducing communication costs in the conjugate gradient algorithm on distributed memory multiprocessors’, Technical Report ORNL/TM-12192, Oak Ridge National Laboratory.
de Groen, P. P. N. (1991), ‘Base p-cyclic reduction for tridiagonal systems of equations’, Appl. Numer. Math. 8 117126.
de Sturler, E. (1991), ‘A parallel restructured version of GMRES(m)’, Technical Report 91–85, Delft University of Technology, Delft.
Deichmoller, A. (1991), ‘Über die Berechnung verallgemeinerter singulärer Werte mittles Jacobi-ähnlicher Verfahren’, PhD thesis, Fernuniversität—Hagen, Hagen, Germany.
Dekel, E., Nassimi, D. and Sahni, S. (1981), ‘Parallel matrix and graph algorithms’, SIAM J. Comput. 10 (4), 657675.
Dekker, T. (1971), ‘A floating point technique for extending the available precision’, Numer. Math. 18 224242.
Demmel, J. (1987), ‘Three methods for refining estimates of invariant subspaces’, Computing 38, 4357.
Demmel, J. (1992a), ‘Specifications for robust parallel prefix operations’, Technical report, Thinking Machines Corp., Cambridge, MA.
Demmel, J. (1992b), ‘Trading off parallelism and numerical stability’, Computer Science Division Technical Report UCB//CSD-92–702, University of California, Berkeley, CA.
Demmel, J. and Higham, N. J. (1992), ‘Stability of block algorithms with fast Level 3 BLAS’, ACM Trans. Math. Soft. 18 (3), 274291.
Demmel, J. and Kahan, W. (1990), ‘Accurate singular values of bidiagonal matrices’, SIAM J. Sci. Statist. Comput. 11 (5), 873912.
Demmel, J. and Veselić, K. (1992), ‘Jacobi's method is more accurate than QR’, SIAM J. Mat. Anal Appl. 13 (4), 12041246.
Doi, S. (1991), ‘On parallelism and convergence of incomplete LU factorizations’, Appl. Numer. Math. 7, 417436.
Dongarra, J. and Grosse, E. (1987), ‘Distribution of mathematical software via electronic mail’, Commun. ACM 30 (5), 403407.
Dongarra, J. and Ostrouchov, S. (1990), ‘LAPACK block factorization algorithms on the Intel iPSC/860’, Computer Science Dept Technical Report CS-90–115, University of Tennessee, Knoxville, TN (LAPACK Working Note 24).
Dongarra, J. and Sidani, M. (1991), ‘A parallel algorithm for the non-symmetric eigenvalue problem’, Computer Science Dept Technical Report CS-91–137, University of Tennessee, Knoxville, TN.
Dongarra, J. and Sorensen, D. (1987), ‘A fully parallel algorithm for the symmetric eigenproblem’, SIAM J. Sci. Statist. Comput. 8 (2), 139154.
Dongarra, J. and van de Geijn, R. (1991), ‘Reduction to condensed form for the eigenvalue problem on distributed memory computers’, Computer Science Dept Technical Report CS-91–130, University of Tennessee, Knoxville, TN (LAPACK Working Note 30), to appear in Parallel Comput.
Dongarra, J. and van de Geijn, R. (1991), ‘Two dimensional basic linear algebra communication subprograms’, Computer Science Dept Technical Report CS-91–138, University of Tennessee, Knoxville, TN (LAPACK Working Note 37).
Dongarra, J., Bunch, J., Moler, C. and Stewart, G. W. (1979), LINPACK User's Guide, SIAM (Philadelphia).
Dongarra, J., Du Croz, J., Duff, I. and Hammarling, S. (1990), ‘A set of Level 3 Basic Linear Algebra Subprograms’, ACM Trans. Math. Soft. 16 (1), 117.
Dongarra, J., Du Croz, J., Hammarling, S. and Hanson, Richard J. (1988), ‘An extended set of fortran basic linear algebra subroutines’, ACM Trans. Math. Soft. 14 (1), 117.
Dongarra, J., Duff, I., Sorensen, D. and van der Vorst, H. (1991), Solving Linear Systems on Vector and Shared Memory Computers, SIAM (Philadelphia).
Dongarra, J., Geist, G. A., and Romine, C. (1990b), ‘Computing the eigenvalues and eigenvectors of a general matrix by reduction to tridiagonal form’, Technical Report ORNL/TM-11669, Oak Ridge National Laboratory (to appear in ACM TOMS).
Dongarra, J., Hammarling, S. and Sorensen, D. (1989), ‘Block reduction of matrices to condensed forms for eigenvalue computations’, J. Comput. Appl. Math. 27, 215227 (LAPACK Working Note 2).
Du Croz, J., Mayes, P. J. D. and Brozolo, G. Radicati di (1990), ‘Factorizations of band matrices using Level 3 BLAS’, Computer Science Dept Technical Report CS-90–109, University of Tennessee, Knoxville, TN (LAPACK Working Note 21).
Dubois, P. and Rodrigue, G. (1977), ‘An analysis of the recursive doubling algorithm’, High Speed Computer and Algorithm Organization, (Kuck, D. J. and Sameh, A. H., eds), Academic Press (New York).
Dubrulle, A. (1991), ‘The multishift QR algorithm: is it worth the trouble?’, Palo Alto Scientific Center Report G320–3558x, IBM Corp., 1530 Page Mill Road, Palo Alto, CA 94304.
Duff, I.S. (1986), ‘Parallel implementation of multifrontal schemes’, Parallel Comput. 2, 193204.
Duff, I.S. and Meurant, G.A. (1989), ‘The effect of ordering on preconditioned conjugate gradient’, BIT 29, 635657.
Duff, I.S., Erisman, A.M. and Reid, J.K. (1986), Direct Methods for Sparse Matrices, Oxford University Press (Oxford, UK).
Eberlein, P. (1962), ‘A Jacobi method for the automatic computation of eigenvalues and eigenvectors of an arbitrary matrix’, J. SIAM 10, 7488.
Eberlein, P. (1987), ‘On the Schur decomposition of a matrix for parallel computation’, IEEE Trans. Comput. 36, 167174.
Eijkhout, V. (1992), ‘Qualitative properties of the conjugate gradient and Lanczos methods in a matrix framework’, Computer Science Dept. Technical Report CS-92–170, University of Tennessee, Knoxville, TN (LAPACK Working Note 51).
Eisenstat, S., Heath, M., Henkel, C. and Romine, C. (1988), ‘Modified cyclic algorithms for solving triangular systems on distributed memory multiprocessors’, SIAM J. Sci. Statist. Comput. 9, 589600.
Fox, G., Hiranandani, S., Kennedy, K., Koelbel, C., Kremer, U., Tseng, C.-W. and Wu, M.-Y. (1990), ‘Fortran D language specification’, Computer Science Department Report CRPC-TR90079, Rice University, Houston, TX.
Fox, G., Johnson, M., Lyzenga, G., Otto, S., Salmon, J. and Walker, D. (1988), Solving Problems on Concurrent Processors, vol. I, Prentice Hall (New York).
Freund, R. W., Golub, G. H. and Nachtigal, N. M. (1992), ‘Iterative solution of linear systems’, Acda Numerica 1, 57100.
Gallivan, K., Heath, M., Ng, E., Ortega, J., Peyton, B., Plemmons, R., Romine, C., Sameh, A. and Voigt, R. (1990), Parallel Algorithms for Matrix Computations, SIAM (Philadelphia).
Gallivan, K., Jalby, W., Meier, U. and Sameh, A. (1988), ‘Impact of hierarchical memory systems on linear algebra algorithm design’, Int. J. Supercomput. Appl. 2, 1248.
Gallivan, K. A., Heath, M. T., Ng, E., et al. (1990), Parallel Algorithms for Matrix Computations, SIAM (Philadelphia).
Gallivan, K. A., Plemmons, R. J. and Sameh, A. H. (1990), ‘Parallel algorithms for dense linear algebra computations’, SIAM Rev. 32, 54135.
Garbow, B. S., Boyle, J. M., Dongarra, J. J. and Moler, C. B. (1977), Matrix eigensystem routines – EISPACK guide extension, Lecture Notes in Computer Science, vol. 51, Springer-Verlag (Berlin).
Geist, G. A. (1990), ‘Parallel tridiagonalization of a general matrix using distributed memory multiprocessors’, in Proc. Fourth SIAM Conf. on Parallel Processing for Scientific Computing, SIAM (Philadelphia), 2935.
Geist, G. A. (1991), ‘Reduction of a general matrix to tridiagonal form’, SIAM J. Math. Anal. Appl. 12(2), 362373.
Geist, G. A. and Davis, G. J. (1990), ‘Finding eigenvalues and eigenvectors of unsymmetric matrices using a distributed memory multiprocessor’, Parallel Comput. 13(2), 199209.
Geist, G. A., Lu, A. and Wachspress, E. (1989), ‘Stabilized reduction of an arbitrary matrix to tridiagonal form’, Technical Report ORNL/TM-11089, Oak Ridge National Laboratory.
Gentleman, M. and Kung, H. T. (1981), ‘Matrix triangularization by systolic arrays’, in Proc. SPIE 298, Real Time Signal Processing, San Diego, CA, 1926.
George, J.A. (1973), ‘Nested dissection of a regular finite element mesh’, SIAM J. Numer. Anal. 10, 345363.
George, J.A. and Liu, J.W.-H. (1981), Computer Solution of Large Sparse Positive Definite Systems, Prentice-Hall (Englewood Cliffs, NJ).
George, J.A. and Liu, J.W.-H. (1989), ‘The evolution of the minimum degree ordering algorithm’, SIAM Rev. 31, 119.
George, J.A., Heath, M.T., Liu, J.W.-H. and Ng, E. (1986), ‘Solution of sparse positive definite systems on a shared memory multiprocessor’, Int. J. Parallel Progr. 15, 309325.
George, J.A., Heath, M.T., Liu, J.W.-H. and Ng, E. (1988), ‘Sparse Cholesky factorization on a local-memory multiprocessor’, SIAM J. Sci. Statist. Comput. 9, 327340.
George, J.A., Heath, M.T., Liu, J.W.-H. and Ng, E. (1989), ‘Solution of sparse positive definite systems on a hypercube’, J. Comput. Appl. Math. 27, 129156.
George, J.A., Liu, J.W.-H. and Ng, E. (1989), ‘Communication results for parallel sparse Cholesky factorization on a hypercube’, Parallel Comput. 10, 287298.
Gilbert, J. and Schreiber, R. (1992), ‘Highly parallel sparse cholesky factorization’, SIAM J. Sci. Statist. Comput. 13, 11511172.
Golub, G. and Van Loan, C. (1989), Matrix Computations, 2nd ed, Johns Hopkins University Press (Baltimore, MD).
Gottlieb, A. and Almasi, G. (1989), Highly Parallel Computing, Benjamin Cummings (Redwood City, CA).
Gu, M. and Eisenstat, S. (1992), ‘A stable and efficient algorithm for the rank-1 modification of the symmetric eigenproblem’, Computer Science Dept Report YALEU/DCS/RR-916, Yale University.
Hari, V. and Veselić, K. (1987), ‘On Jacobi methods for singular value decompositions’, SIAM J. Sci. Statist. Comput. 8, 741754.
Heath, M.T. and Romine, C. (1988), ‘Parallel solution of triangular systems on distributed memory multiprocessors’, SIAM J. Sci. Statist. Comput. 9, 558588.
Heath, M.T., Ng, E. and Peyton, B. (1991), ‘Parallel algorithms for sparse linear systems’, SIAM Rev. 33, 420460.
Heller, D. (1978), ‘Some aspects of the cyclic reduction algorithm for block tridiagonal linear systems’, SIAM J. Numer. Anal. 13, 484496.
Hestenes, M. R. and Stiefel, E. (1954), ‘Methods of conjugate gradients for solving linear systems’, J. Res. Natl Bur. Stand. 49, 409436.
High Peformance Fortran (1991), documentation available via anonymous ftp from titan.cs.rice.edu in directory public/HPFF.
Higham, N. J. (1990), ‘Exploiting fast matrix multiplication within the Level 3 BLAS’, ACM Trans. Math. Soft. 16, 352368.
Ho, C. T. (1990), Optimal communication primitives and graph embeddings on hypercubes, PhD thesis, Yale University.
Ho, C. T., Johnsson, S. L. and Edelman, A. (1991), ‘Matrix multiplication on hypercubes using full bandwidth and constant storage’, in The Sixth Distributed Memory Computing Conf. Proc., IEEE Computer Society Press (New York),447451.
Hong, X. and Kung, H. T. (1981), ‘I/O complexity: the red blue pebble game’, in Proc. 13th Symposium on the Theory of Computing, ACM (New York), 326334.
Howland, J. (1983), ‘The sign matrix and the separation of matrix eigenvalues’, Lin. Alg. Appl. 49, 221232.
Ipsen, I. and Jessup, E. (1990), ‘Solving the symmetric tridiagonal eigenvalue problem on the hypercube’, SIAM J. Sci. Statist. Comput. 11(2), 203230.
Irons, B.M. (1970), ‘A frontal solution program for finite element analysis’, Int. J. Numer. Math. Engrg 2, 532.
Jess, J.A.G. and Kees, H.G.M. (1982), ‘A data structure for parallel L/U decomposition’, IEEE Trans. Comput. C-31, 231239.
Jessup, E. (1991), ‘A case against a divide and conquer approach to the nonsymmetric eigenproblem’, Technical Report ORNL/TM-11903, Oak Ridge National Laboratory.
Jessup, E. and Ipsen, I. (1992), ‘Improving the accuracy of inverse iteration’, SIAM J. Sci. Statist. Comput. 13(2), 550572.
Jessup, E. and Sorensen, D. (1989), ‘A divide and conquer algorithm for computing the singular value decomposition of a matrix’, in Proc. Third SIAM Conf. on Parallel Processing for Scientific Computing SIAM (Philadelphia), 6166. SIAM.
Johnsson, S. L. (1987), ‘Communication efficient basic linear algebra computations on hypercube architecture’, J. Parallel Distr. Comput. 4, 133172.
Johnsson, S. L. (1990), private communication.
Johnsson, S. L. and Ho, C. T. (1989), ‘Matrix multiplication on Boolean cubes using generic communication primitives’, in Parallel Processing and Medium Scale Multiprocessors, (Wouk, A., ed.), SIAM (Philadelphia), 108156.
Kahan, W. (1968), ‘Accurate eigenvalues of a symmetric tridiagonal matrix’, Computer Science Dept Technical Report CS41, Stanford University, Stanford, CA, July 1966 (revised June 1968).
Karp, R. and Ramachandran, V. (1990), ‘Parallel algorithms for shared memory machines’, in Handbook of Theoretical Computer Science, vol. A: Algorithms and Complexity, (van Leeuwen, J., ed.), Elsevier and MIT Press (New York), 869941.
Kenney, C. and Laub, A. (1991), ‘Rational iteration methods for the matrix sign function’, SIAM J. Math. Anal. Appl. 21, 487494.
Krishnakumar, A. S. and Morf, (1986), ‘Eigenvalues of a symmetric tridiagonal matrix: a divide and conquer approach’, Numer. Math. 48, 349368.
Kuck, D. and Sameh, A. (1977), ‘A parallel QR algorithm for symmetric tridiagonal matrices’, IEEE Trans. Comput. C-26(2).
Kung, H. T. (1974), ‘New algorithms and lower bounds for the parallel evaluation of certain rational expressions’, Technical report, Carnegie Mellon University.
Lambiotte, J. J. and Voigt, R. G. (1974), ‘The solution of tridiagonal linear systems on the CDC-STAR-100 computer’, Technical report, ICASE-NASA Langley Research Center, Hampton, VA.
Lang, B. (1992), ‘Reducing symmetric band matrices to tridiagonal form – a comparison of a new parallel algorithm with two serial algorithms on the iPSC/860’, Institut für angewandte mathematik report, Universität Karlsruhe.
Lawson, C., Hanson, R., Kincaid, D. and Krogh, F. (1979), ‘Basic linear algebra subprograms for fortran usage’, ACM Trans. Math. Soft. 5, 308323.
Lederman, S., Tsao, A. and Turnbull, T. (1992), ‘A parallelizable eigensolver for real diagonalizable matrices with real eigenvalues’, Report TR-01–042, Supercomputing Research Center, Bowie, MD.
Lewis, J.G., Peyton, B.W. and Pothen, A. (1989), ‘A fast algorithm for reordering sparse matrices for parallel factorization’, SIAM J. Sci. Statist. Comput. 10, 11561173.
Li, G. and Coleman, T. (1988), ‘A parallel triangular solver on a distributed memory multiprocessor’, SIAM J. Sci. Statist. Comput. 9, 485502.
Li, R. (1992), ‘Solving the secular equation stably’, UC Berkeley Math. Dept Report, in preparation.
Li, T.-Y. and Zeng, Z. (1992), ‘Homotopy-determinant algorithm for solving nonsymmetric eigenvalue problems’, Math. Comput. to appear.
Li, T.-Y., Zeng, Z. and Cong, L. (1992), ‘Solving eigenvalue problems of nonsymmetric matrices with real homotopies’, SIAM J. Numer. Anal. 29(1), 229248.
Li, T.-Y., Zhang, H. and Sun, X.-H. (1991), ‘Parallel homotopy algorithm for symmetric tridiagonal eigenvalue problem’, SIAM J. Sci. Statist. Comput. 12(3), 469487.
Lin, C-C. and Zmijewski, E. (1991), ‘A parallel algorithm for computing the eigenvalues of an unsymmetric matrix on an SIMD mesh of processors’, Department of Computer Science TRCS 91–15, University of California, Santa Barbara, CA.
Liu, J.W.-H. (1986), ‘Computational models and task scheduling for parallel sparse Cholesky factorization’, Parallel Comput. 2, 327342.
Liu, J.W.-H. (1989), ‘Reordering sparse matrices for parallel elimination’, Parallel Comput. 11, 7391.
Liu, J.W.-H. (1990), ‘The role of elimination trees in sparse factorization’, SIAM J. Matrix Anal. Appl. 11, 134172.
Lo, S.-S., Phillipe, B. and Sameh, A. (1987), ‘A multiprocessor algorithm for the symmetric eigenproblem’, SIAM J. Sci. Statist. Comput. 8(2), 155165.
Lucas, R., Blank, W. and Tieman, J. (1987), ‘A parallel solution method for large sparse systems of equations’, IEEE Trans. Comput. Aided Des. CAD-6, 981991.
Luk, F. and Park, H. (1989), ‘On parallel Jacobi orderings’, SIAM J. Sci. Statist. Comput. 10(1), 1826.
Ma, S. C., Patrick, M. and Szyld, D. (1989), ‘A parallel, hybrid algorithm for the generalized eigenproblem’, in Parallel Processing for Scientific Computing, (Rodrigue, Garry, ed.), SIAM (Philadelphia), ch 16, 8286.
Madsen, N. K., Rodrigue, G. H. and Karush, J. I. (1976), ‘Matrix multiplication by diagonals on a vector/parallel processor’, Inform. Process. Lett. 5, 4145.
Malyshev, A. N. (1991), ‘Parallel aspects of some spectral problems in linear algebra’, Dept of Numerical Mathematics Report NM-R9113, Centre for Mathematics and Computer Science, Amsterdam.
Mehrmann, V. (1991), ‘Divide and conquer methods for block tridiagonal systems’, Technical Report Bericht Nr. 68, Inst. fuer Geometrie und Prakt. Math., Aachen.
Meier, U. (1985), ‘A parallel partition method for solving banded systems of linear equations’, Parallel Comput. 2, 3343.
Meijerink, J. A. and van der Vorst, H. A. (1977), ‘An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix’, Math. Comput. 31, 148162.
Meijerink, J. A. and van der Vorst, H. A. (1981), ‘Guidelines for the usage of incomplete decompositions in solving sets of linear equations as they occur in practical problems’, J. Comput. Phys. 44, 134155.
Meurant, G. (1984a), ‘The block preconditioned conjugate gradient method on vector computers’, BIT 24, 623633.
Meurant, G. (1984b), ‘Numerical experiments for the preconditioned conjugate gradient method on the CRAY X-MP/2’, Technical Report LBL-18023, University of California, Berkeley, CA.
Michielse, P. H. and van der Vorst, H. A. (1988), ‘Data transport in Wang's partition method’, Parallel Comput. 7, 8795.
Mu, M. and Rice, J.R. (1992), ‘A grid based subtree-subcube assignment strategy for solving partial differential equations on hypercubes’, SIAM J. Sci. Statist. Comput. 13, 826839.
Ortega, J.M. (1988), Introduction to Parallel and Vector Solution of Linear Systems, Plenum Press (New York and London).
Paardekooper, M.H.C. (1989), ‘A quadratically convergent parallel Jacobi process for diagonally dominant matrices with distinct eigenvalues’, J. Comput. Appl. Math. 27, 316.
Paige, C. C. (1976), ‘Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix’, J. Inst. Math. Appl. 18, 341349.
Pandey, P., Kenney, C. and Laub, A. (1990), ‘A parallel algorithm for the matrix sign function’, Int. J. High Speed Comput. 2(2), 181191.
Parlett, B. N. (1980), The Symmetric Eigenvalue Problem, Prentice-Hall (Englewood Cliffs, NJ).
Parlett, B. (1992a), private communication.
Parlett, B. (1992b), ‘Reduction to tridiagonal form and minimal realizations’, SIAM J. Math. Anal. Appl. 13(2), 567593.
Parlett, B. and Fernando, V. (1992), ‘Accurate singular values and differential QD algorithms’, Math. Department PAM-554, University of California, Berkeley, CA.
Parlett, B. N. and Reid, J. K. (1981), ‘Tracking the progress of the Lanczos algorithm for large symmetric eigenproblems’, IMA J. Numer. Anal. 1, 135155.
Petiton, S. G. (1992), ‘Parallel subspace method for non-Hermitian eigenproblems on the Connection Machine (CM2)’, Appl. Numer. Math. 10.
Pommerell, C. (1992), Solution of large unsymmetric systems of linear equations, PhD thesis, Swiss Federal Institute of Technology, Zürich.
Pothen, A., Jha, S. and Vemulapati, U. (1987), ‘Orthogonal factorization on a distributed memory multiprocessor’, in Hypercube Multiprocessors 1987, Knoxville, TN, SIAM (Philadelphia), 587598.
Priest, D. (1991), ‘Algorithms for arbitrary precision floating point arithmetic’, in Proc. 10th Symp. Computer Arithmetic, Grenoble, France, (Kornerup, P. and Matula, D., eds), IEEE Computer Society Press (New York), 132145.
di Brozolo, G. Radicati and Robert, Y. (1989), ‘Parallel conjugate gradient-like algorithms for solving sparse nonsymmetric linear systems on a vector multiprocessor’, Parallel Comput. 11, 223239.
Robert, Y. (1990), The Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm, Wiley (New York).
Roberts, J. (1980), ‘Linear model reduction and solution of the algebraic Riccati equation’, Int. J. Control 32, 677687.
Romine, C. and Ortega, J. (1988), ‘Parallel solution of triangular systems of equations’, Parallel Comput. 6, 109114.
Rothberg, E. and Gupta, A. (1989), ‘Fast sparse matrix factorization on modern workstations’, Technical Report STAN-CS-89–1286, Stanford University, Stanford, California.
Saad, Y. (1980), ‘Variations on Arnoldi's method for computing eigenelements of large unsymmetric matrices’, Lin. Alg. Appl. 34, 269295.
Saad, Y. (1985a), ‘Partial eigensolutions of large nonsymmetric matrices’, Technical Report, Dept. of Comp. Science, New Haven, CN.
Saad, Y. (1985b), ‘Practical use of polynomial preconditionings for the conjugate gradient method’, SIAM J. Sci. Statist. Comput. 6, 865881.
Saad, Y. (1988), ‘Krylov subspace methods on supercomputers’, Technical report, RIACS, Moffett Field, CA.
Saad, Y. and Schultz, M. H. (1985), ‘Conjugate Gradient-like algorithms for solving nonsymmetric linear systems’, Math. Comput. 44, 417424.
Sameh, A. (1971), ‘1On Jacobi and Jacobi-like algorithms for a parallel computer’, Math. Comput. 25, 579590.
Sameh, A. (1985), ‘On some parallel algorithms on a ring of processors’, Comput. Phys. Commun. 37, 159166.
Sameh, A. and Brent, R. (1977), ‘Solving triangular systems on a parallel computer’, SIAM J. Numer. Anal. 14, 11011113.
Schlichting, J. J. F. M. and van der Vorst, H. A. (1987), ‘Solving bidiagonal systems of linear equations on the CDC CYBER 205’, Technical Report NM-R8725, CWI, Amsterdam, the Netherlands.
Schlichting, J. J. F. M. and van der Vorst, H. A. (1989), ‘Solving 3D block bidiagonal linear systems on vector computers’, J. Comput. Appl. Math. 27, 323330.
Schreiber, R. and Van Loan, C. (1989), ‘A storage efficient WY representation for products of Householder transformations’, SIAM J. Sci. Statist. Comput. 10, 5357.
Seager, M. K. (1986), ‘Parallelizing conjugate gradient for the CRAY X-MP’, Parallel Comput. 2, 35–7.
Shroff, G. (1991), ‘A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix’, Numer. Math. 58, 779805.
Shroff, G. and Schreiber, R. (1989), ‘On the convergence of the cyclic Jacobi method for parallel block orderings’, SIAM J. Math. Anal. Appl. 10, 326346.
Simon, H. (1989), ‘Bisection is not optimal on vector processors’, SIAM J. Sci. Statist. Comput. 10(1), 205209.
Slapničar, I. (1992), Accurate symmetric eigenreduction by a Jacobi method, PhD thesis, Fernuniversität – Hagen, Hagen, Germany.
Smith, B. T., Boyle, J. M., Dongarra, J. J., Garbow, B. S., Ikebe, Y., Klema, V. C. and Moler, C. B. (1976), Matrix Eigensystem Routines – EISPACK Guide: Springer Lecture Notes in Computer Science 6, Springer-Verlag (Berlin).
Sorensen, D. (1992), ‘Implicit application of polynomial filters in a k-step Arnoldi method’, SIAM J. Math. Anal. Appl. 13(1), 357385.
Sorensen, D. and Tang, P. (1991), ‘On the orthogonality of eigenvectors computed by divide-and-conquer techniques’, SIAM J. Numer. Anal. 28(6), 17521775.
Stewart, G. W. (1985), ‘A Jacobi-like algorithm for computing the Schur decomposition of a non-Hermitian matrix’, SIAM J. Sci. Statist. Comput. 6, 853864.
Stewart, G. W. (1987), ‘A parallel implementation of the QR algorithm’, Parallel Comput. 5, 187196.
Stickel, E. (1991), ‘Separating eigenvalues using the matrix sign function’, Lin. Alg. Appl. 148, 7588.
Stone, H. S. (1973), ‘An efficient parallel algorithm for the solution of a tridiagonal linear system of equations’, J. Assoc. Comput. Mach. 20, 2738.
Swarztrauber, P. (1992), ‘A parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix’, Math. Comput. to appear.
Thinking Machines Corporation (1987), Connection Machine Model CM-2 Technical Summary, IBM.
van de Geijn, R. (1987), Implementing the QR algorithm on an array of processors, PhD thesis, University of Maryland, College Park, Computer Science Department Report TR-1897.
van de Geijn, R. and Hudson, D. (1989), ‘Efficient parallel implementation of the nonsymmetric QR algorithm’, in Hypercube Concurrent Computers and Applications, (Gustafeon, J., ed.), ACM (New York).
Van de Velde, E. (1992), Introduction to Concurrent Scientific Computing, Caltech (Pasadena).
van der Vorst, H. A. (1982), ‘A vectorizable variant of some ICCG methods’, SIAM J. Sci. Statist. Comput. 2, 8692.
van der Vorst, H. A. (1986), ‘The performance of Fortran implementations for preconditioned conjugate gradients on vector computers’, Parallel Comput. 2, 4958.
van der Vorst, H. A. (1987a), ‘Analysis of a parallel solution method for tridiagonal linear systems’, Parallel Comput. 5, 303311.
van der Vorst, H. A. (1987b), ‘Large tridiagonal and block tridiagonal linear systems on vector and parallel computers’, Parallel Comput. 5, 4554.
van der Vorst, H. A. (1989a), ‘High performance preconditioning’, SIAM J. Sci. Statist. Comput. 10, 11741185.
van der Vorst, H. A. (1989b), ‘ICCG and related methods for 3D problems on vector computers’, Comput. Phys. Commun. 53, 223235.
van der Vorst, H. A. (1989c), ‘Practical aspects of parallel scientific computing’, Future Gener. Comput. Sys. 4, 285291.
van der Vorst, H. A. and Dekker, K. (1989), ‘Vectorization of linear recurrence relations’, SIAM J. Sci. Statist. Comput. 10, 2735.
van der Vorst, H. A. and Vuik, C. (1991), ‘GMRESR: A family of nested GMRES methods’, Technical Report 91–80, Delft University of Technology, Faculty of Technical Mathematics.
Veselić, K. (1979), ‘A quadratically convergent Jacobi-like method for real matrices with complex conjugate eigenvalues’, Numer. Math. 33, 425435.
Walker, H. F. (1988), ‘Implementation of the GMRES method using Householder transformations’, SIAM J. Sci. Statist. Comput. 9, 152163.
Wang, H. H. (1981), ‘A parallel method for tridiagonal equations’, ACM Trans. Math. Soft. 7, 170183.
Watkins, D. (1992), ‘Shifting strategies for the parallel QR algorithm’, Dept of Pure and Applied Mathematics report, Washington State Univ., Pullman, WA.
Watkins, D. and Elsner, L. (1991), ‘Convergence of algorithms of decomposition type for the eigenvalue problem’, Lin. Alg. Appl. 143, 1947.
Wen, C.-P. and Yelick, K. (1992), ‘A survey of message passing systems’, Computer science division, University of California, Berkeley, CA.
Wilkinson, J. H. (1965), The Algebraic Eigenvalue Problem, Oxford University Press (Oxford).
Zeng, Z. (1991), Homotopy-determinant algorithm for solving matrix eigenvalue problems and its parallelizations, PhD thesis, Michigan State University.
Zima, H. and Chapman, B. (1991), Supercompilers for Parallel and Vectors Computers, ACM (New York).
Zmijewski, E. (1989), ‘Limiting communication in parallel sparse Cholesky factorization’, Technical Report TRCS89–18, Department of Computer Science, University of California, Santa Barbara, CA.

Related content

Powered by UNSILO

Parallel numerical linear algebra

  • James W. Demmel (a1), Michael T. Heath (a2) and Henk A. van der Vorst (a3)

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.