🤖 AI Summary
This work addresses the lack of a unified and efficient framework for linear and multilinear (tensor) computations across diverse hardware architectures—from single-node CPUs to large-scale parallel systems with GPU acceleration. To bridge this gap, the authors propose a vertically integrated software stack that systematically combines the FLAME algorithmic derivation methodology with established high-performance libraries and runtime systems, including BLIS, TBLIS, SuperMatrix, and Elemental. This integration yields, for the first time, a cohesive infrastructure capable of delivering high performance for both linear algebra and tensor operations across heterogeneous platforms. The resulting framework supports single-core, multi-core, GPU-accelerated, and distributed-memory environments, offering both flexibility and scalability while providing a robust foundation for scientific computing and machine learning applications.
📝 Abstract
We leverage highly successful prior projects sponsored by multiple NSF grants and gifts from industry: the BLAS-like Library Instantiation Software (BLIS) and the libflame efforts to lay the foundation for a new flexible framework by vertically integrating the dense linear and multi-linear (tensor) software stacks that are important to modern computing. This vertical integration will enable high-performance computations from node-level to massively-parallel, and across both CPU and GPU architectures. The effort builds on decades of experience by the research team turning fundamental research on the systematic derivation of algorithms (the NSF-sponsored FLAME project) into practical software for this domain, targeting single and multi-core (BLIS, TBLIS, and libflame), GPU-accelerated (SuperMatrix), and massively parallel (PLAPACK, Elemental, and ROTE) compute environments. This project will implement key linear algebra and tensor operations which highlight the flexibility and effectiveness of the new framework, and set the stage for further work in broadening functionality and integration into diverse scientific and machine learning software.