Fast GPU Linear Algebra via Compile Time Expression Fusion

📅 2026-04-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the inefficiencies of conventional GPU linear algebra libraries, which often fail to fully exploit hardware capabilities due to redundant memory accesses and runtime overhead. To overcome these limitations, the authors propose Bandicoot, a toolkit that leverages C++ template metaprogramming to perform expression fusion at compile time, thereby automatically generating highly optimized GPU kernels that saturate memory bandwidth—without relying on just-in-time compilation or runtime scheduling. Bandicoot provides an API compatible with Armadillo, facilitating straightforward migration of existing CPU codebases. Experimental results demonstrate that Bandicoot significantly outperforms PyTorch, TensorFlow, and JAX across multiple benchmarks, achieving substantial speedups in several scenarios.

Technology Category

Application Category

📝 Abstract
We describe the Bandicoot GPU linear algebra toolkit, a C++ based library that prioritises ease of use without compromising efficiency. Bandicoot's API is compatible with the popular Armadillo CPU linear algebra library, enabling easy transition for existing CPU-based codebases. Unlike other GPU-focused toolkits, Bandicoot uses template metaprogramming to generate fused GPU kernels directly at compile time, yielding efficient kernels that are often able to saturate memory bandwidth. This removes the need for runtime overhead or JIT infrastructure. Empirical results show that Bandicoot outperforms (sometimes by considerable margins) commonly-used linear algebra toolkits including PyTorch, TensorFlow, and JAX.
Problem

Research questions and friction points this paper is trying to address.

GPU linear algebra
compile-time fusion
memory bandwidth
ease of use
performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

compile-time fusion
template metaprogramming
GPU kernel optimization
memory bandwidth saturation
linear algebra library
🔎 Similar Papers
No similar papers found.