What is libFLAME?
libFLAME is a high performance dense linaer algebra library that is the result of the FLAME methodology for systematically developing dense linear algebra libraries. The FLAME methodology is radically different from the LINPACK/LAPACK approach that dates back to the 1970s. For more information about the methodology, visit the Methodology page.The source code for libflame is now hosted via github. We recommend using:
git clone https://github.com/flame/libflame.gitto download the source code. This will allow you to easily keep your local clone up-to-date as the github source code is updated. So, after you have created your local clone, you can simply run:
git pullto fetch and merge the latest changes into your local copy of libflame.
Users may post questions and comments about libflame to the libflame-discuss mailing list. (For now, developers are also encouraged to use this list to communicate with one another.)
What's provided by libFLAME?The following libflame features benefit both basic and advanced users, as well as library developers:
- A solution based on fundamental computer science. The FLAME project advocates a new approach to developing linear algebra libraries. Algorithms are obtained systematically according to rigorous principles of formal derivation. These methods are based on fundamental theorems of computer science to guarantee that the resulting algorithm is also correct. In addition, the FLAME methodology uses a new, more stylized notation for expressing loop-based linear algebra algorithms. This notation closely resembles how algorithms are naturally illustrated with pictures.
- Object-based abstractions and API. The BLAS, LAPACK, and ScaLAPACK projects place backward compatibility as a high priority, which hinders progress towards adopting modern software engineering principles such as object abstraction. libflame is built around opaque structures that hide implementation details of matrices, such as leading dimensions, and exports object-based programming interfaces to operate upon these structures. Likewise, FLAME algorithms are expressed (and coded) in terms of smaller operations on sub-partitions of the matrix operands. This abstraction facilitates programming without array or loop indices, which allows the user to avoid painful index-related programming errors altogether. This similarity is quite intentional, as it preserves the clarity of the original algorithm as it would be illustrated on a white-board or in a publication.
- Educational value. Aside from the potential to introduce students to formal algorithm derivation, FLAME serves as an excellent vehicle for teaching linear algebra algorithms in a classroom setting. The clean abstractions afforded by the API also make FLAME ideally suited for instruction of high-performance linear algebra courses at the undergraduate and graduate level. Robert van de Geijn routinely uses FLAME in his linear algebra and numerical analysis courses. Some colleagues of the FLAME project are even beginning to use the notation to teach classes elsewhere around the country, including Timothy Mattson of Intel Corporation. Historically, the BLAS/LAPACK style of coding has been used in these settings. However, coding in this manner tends to obscure the algorithms; students often get bogged down debugging the frustrating errors that often result from indexing directly into arrays that represent the matrices.
- A complete dense linear algebra framework. Like LAPACK, libflame provides ready-made implementations of common linear algebra operations. The implementations found in libflame mirror many of those found in the BLAS and LAPACK packages. However, unlike LAPACK, libflame provides a framework for building complete custom linear algebra codes. We believe such an environment is more useful as it allows the user to quickly prototype a linear algebra solution to fit the needs of his application. We are currently writing a complete user's guide for libflame. In the meantime, users may browse the full list of routines available in libflame through our online doxygen documentation.
- High performance. In our publications and performance graphs, we do our best to dispel the myth that user- and programmer-friendly linear algebra codes cannot yield high performance. Our FLAME implementations of operations such as Cholesky factorization and Triangular Inversion often outperform the corresponding implementations available in the LAPACK library. Many instances of the libflame performance advantage result from the fact that LAPACK provides only one variant (algorithm) of every operation, while libflame provides all known variants. This allows the user and/or library developer to choose which algorithmic variant is most appropriate for a given situation. libflame relies only on the presence of a core set of highly optimized unblocked routines to perform the small sub-problems found in FLAME algorithm codes.
- Dependency-aware multithreaded parallelism. Until recently, the authors of the BLAS and LAPACK advocated getting shared-memory parallelism from LAPACK routines by simply linking to multithreaded BLAS. This low-level solution requires no changes to LAPACK code but also suffers from sharp limitations in terms of efficiency and scalability for small- and medium-sized matrix problems. The fundamental bottleneck to introducing parallelism directly within many algorithms is the web of data dependencies that inevitably exists between sub-problems. The libflame project has developed a runtime system, SuperMatrix, to detect and analyze dependencies found within FLAME algorithms-by-blocks (algorithms whose sub-problems operate only on block operands). Once dependencies are known, the system schedules sub-operations to independent threads of execution. This system is completely abstracted from the algorithm that is being parallelized and requires virtually no change to the algorithm code, but at the same time exposes abundant high-level parallelism. We have observed that this method provides increased performance for a range of small- and medium-sized problems. The most recent version of LAPACK does not offer any similar mechanism.
- Support for hierarchical storage-by-blocks. Storing matrices by blocks, a concept advocated years ago by Fred Gustavson of IBM, often yields performance gains through improved spatial locality. Instead of representing matrices as a single linear array of data with a prescribed leading dimension as legacy libraries require (for column- or row-major order), the storage scheme is encoded into the matrix object. Here, internal elements refer recursively to child objects that represent sub-matrices. Currently, libflame provides a subset of the conventional API that supports hierarchical matrices, allowing users to create and manage such matrix objects as well as convert between storage-by-blocks and conventional "flat" storage schemes.
- Advanced build system. From its early revisions, libflame distributions have been bundled with a robust build system, featuring automatic makefile creation and a configuration script conforming to GNU standards (allowing the user to run the ./configure; make; make install sequence common to many open source software projects). Without any user input, the configure script searches for and chooses compilers based on a pre-defined preference order for each architecture. The user may request specific compilers via the configure interface, or enable other non-default features of libflame such as custom memory alignment, multithreading (via POSIX threads or OpenMP), compiler options (debugging symbols, warnings, optimizations), and memory leak detection. The reference BLAS and LAPACK libraries provide no configuration support and require the user to manually modify a makefile with appropriate references to compilers and compiler options depending on the host architecture.
- Windows support. While libflame was originally developed for GNU/Linux and UNIX environments, we have in the course of its development had the opportunity to port the library to Microsoft Windows. The Windows port features a separate build system implemented with Python and nmake, the Microsoft analogue to the make utility found in UNIX-like environments. As of this writing, the port is still very new and therefore should be considered experimental. However, we feel libflame for Windows is very close to usable for many in our audience, particularly those who consider themselves experts. We invite interested users to try the software and, of course, we welcome feedback to help improve our Windows support, and libflame in general.
- Independence from Fortran and LAPACK. The libflame development team is pleased to offer a high-performance linear algebra solution that is 100% Fortran-free. libflame is a C-only implementation and does not depend on any external Fortran libraries, such as LAPACK. That said, we happily provide an optional backward compatibility layer, lapack2flame, that maps legacy LAPACK routine invocations to their corresponding native C implementations in libflame. This allows legacy applications to start taking advantage of libflame with virtually no changes to their source code. Furthermore, we understand that some users wish to leverage highly-optimized implementations that conform to the LAPACK interface, such as Intel's Math Kernel Library (MKL). As such, we allow those users to configure libflame such that their external LAPACK implementation is called for the small, performance-sensitive unblocked subproblems that arise within libflame's blocked algorithms and algorithms-by-blocks.