π€ AI Summary
Existing CPU-based linear algebra libraries (e.g., Armadillo) face significant challenges in efficient GPU porting. To address this, we propose Bandicootβa CUDA-accelerated C++ library that maintains full interface compatibility with Armadillo. Its core innovations include: (1) a compile-time expression template system leveraging CUDA and C++ template metaprogramming to enable delayed evaluation and automatic mathematical expression optimization; and (2) a unified abstraction layer that transparently bridges CPU and GPU execution, minimizing required code modifications. Experimental evaluation on representative linear algebra workloads demonstrates speedups of several-fold to over an order of magnitude over CPU-only Armadillo. Bandicoot is open-sourced under the Apache 2.0 license, facilitating integration into existing Armadillo-based scientific computing pipelines.
π Abstract
We introduce the Bandicoot C++ library for linear algebra and scientific computing on GPUs, overviewing its user interface and performance characteristics, as well as the technical details of its internal design. Bandicoot is the GPU-enabled counterpart to the well-known Armadillo C++ linear algebra library, aiming to allow users to take advantage of GPU-accelerated computation for their existing codebases without significant changes. Exploiting similar internal template meta-programming techniques that Armadillo uses, Bandicoot is able to provide compile-time optimisation of mathematical expressions within user code, leading to more efficient execution. Empirical evaluations show that Bandicoot can provide significant speedups over Armadillo-based CPU-only computation. Bandicoot is available at https://coot.sourceforge.io and is distributed as open-source software under the permissive Apache 2.0 license.