Explore chapters and articles related to this topic
Numerical Methods for Large-Scale Electronic State Calculation on Supercomputer
Published in Klaus D. Sattler, st Century Nanoscience – A Handbook, 2019
Takeo Hoshi, Yusaku Yamamoto, Tomohiro Sogabe, Kohei Shimamura, Fuyuki Shimojo, Aiichiro Nakano, Rajiv Kalia, Priya Vashishta
In high-performance implementations of the LU decomposition, the blocked algorithm is used [17]. In the blocked algorithm, the matrix A is partitioned into square blocks of appropriate size, say b × b (Figure 15.3a), and the LU decomposition is performed by regarding each block as a matrix element. Thus, the multiplication of two elements in the original algorithm translates into a matrix-matrix multiplication. If b is chosen so that three blocks can be stored in the cache memory, this matrix multiplication can be performed entirely within the cache memory and therefore access to the slower main memory is reduced. Blocked algorithms are used extensively in LAPACK [2], which is the de facto standard matrix library for sequential or shared-memory parallel machines. Note that LAPACK has various routines for linear equation solution, depending on whether the matrix A is nonsymmetric or SPD, and whether it is fully dense or banded (i.e., aij = 0 unless —NLD ≤ j − i ≤ NUD for some nonnegative integers NLD and NUD). LAPACK routines can also deal with the equations with multiple right-hand-sides, written as AX = B.
Lapack
Published in Leslie Hogben, Richard Brualdi, Anne Greenbaum, Roy Mathias, Handbook of Linear Algebra, 2006
Zhaojun Bai, James Demmel, Jack Dongarra, Julien Langou, Jenny Wang
LAPACK (linear algebra package) is an open source library of programs for solving the most commonly occurring numerical linear algebra problems [LUG99]. Original codes of LAPACK are written in Fortran 77. Complete documentation as well as source codes are available online at the Netlib repository [LAP]. LAPACK provides driver routines for solving complete problems such as linear equations, linear least squares problems, eigenvalue problems, and singular value problems. Each driver routine calls a sequence of computational routines, each of which performs a distinct computational task. In addition, LAPACK provides comprehensive error bounds for most computed quantities. LAPACK is designed to be portable for sequential and shared memory machines with deep memory hierarchies, in which most performance issues could be reduced to providing optimized versions of the Basic Linear Algebra Subroutines (BLAS). (See Chapter 74).
Linear Systems
Published in Jeffery J. Leader, Numerical Analysis and Scientific Computation, 2022
The MATLAB package was designed for problems of this sort. It was originally intended to be an interactive interface to certain Fortran computational matrix algebra subroutines from the LINPACK and EISPACK subroutine libraries, later re-written in C. Starting with version 6, the newer LAPACK routines have replaced their older LINPACK and EISPACK counterparts. (See www.netlib.org/lapack for details on LAPACK.) The LAPACK routines use newer numerical algorithms in some cases (though generally they are based on the same ideas as before). Just as importantly, they utilize the Basic Linear Algebra Subroutines (BLAS). These machine-specific subroutines take advantage of the particulars of the computer's architecture, especially its memory structure, to greatly increase the speed of various matrix manipulations. The BLAS are divided into three classes: Level 1 (vector-vector operations), Level 2 (matrix-vector operations), and Level 3 (matrix-matrix operations). Higher levels generally give better speed, and are superior to the traditional element-by-element manipulations. (Of course, the BLAS themselves must use element-by-element operations on a standard computer; speedups in this case are achieved by arranging the computations for most efficient processing, including the details of retrieval of data from memory and the attempt to maximize the number of computations performed per retrieval of data from memory. On a vector machine the effects will be even more noticeable.) The LAPACK routines, originally designed for use on supercomputers, use (sub-)blocks of matrices to perform much faster. The key change in computer architecture that makes these changes worthwhile has been the widespread use of caches, that is, fast memory near the CPU. Cache management is an important practical issue that we will not be able to address in any detail because of its hardware-specific nature; by carefully using the BLAS and relying on the computer vendor to supply appropriate implementations optimized for their particular machine, the methods can be (and have been) made portable.
Enhancing parallelism of distributed algorithms with the actor model and a smart data movement technique
Published in International Journal of Parallel, Emergent and Distributed Systems, 2021
Anatoliy Doroshenko, Eugene Tulika, Olena Yatsenko
BLAS library is used effectively to work with separate blocks of the matrix within a single node with shared memory. BLAS library is a set of matrix operations implemented in Fortran and C languages and optimised for specific hardware. BLAS is the fastest library for working with matrices and provides interfaces for various programming languages. Most of the LAPACK and ScaLAPACK algorithms consist of some fundamental operations, which are implemented using BLAS Level-2 or Level-3 library. The BLAS performance depends on a size of a matrix and the best performance is reached with large block size. Using BLAS/LAPACK, the calculation of steps is implemented with the following operations: DPOTF2 calculates the Cholesky factorisation of the diagonal element .DTRSM calculates the column according to the formula (1).DSYRK updates the rest of the matrix by the formula (2).