Fast Entropy Decoding for Sparse MVM on GPUs

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the storage and computational bottlenecks of sparse matrix-vector multiplication (SpMVM) on GPUs by proposing a co-optimized compression and computation scheme based on dtANS, a novel lossless entropy coding method. Building upon the standard CSR format, the approach integrates dtANS encoding specifically designed for parallel decoding on GPUs, enabling seamless integration of high compression ratios with fast decompression. Experimental results demonstrate that, on large-scale sparse matrices, the method achieves an average compression ratio of 11.77× and accelerates SpMVM performance by up to 3.48×, significantly outperforming both cuSPARSE and the AI-driven AlphaSparse approach.

Technology Category

Application Category

📝 Abstract
We present a novel, practical approach to speed up sparse matrix-vector multiplication (SpMVM) on GPUs. The novel key idea is to apply lossless entropy coding to further compress the sparse matrix when stored in one of the commonly supported formats. Our method is based on dtANS, our new lossless compression method that improves the entropy coding technique of asymmetric numeral systems (ANS) specifically for fast parallel GPU decoding when used in tandem with SpMVM. We apply dtANS on the widely used CSR format and present extensive benchmarks on the SuiteSparse collection of matrices against the state-of-the-art cuSPARSE library. On matrices with at least 2^(15) entries and at least 10 entries per row on average, our compression reduces the matrix size over the smallest cuSPARSE format (CSR, COO and SELL) in almost all cases and up to 11.77 times. Further, we achieve an SpMVM speedup for the majority of matrices with at least 2^(25) nonzero entries. The best speedup is 3.48x. We also show that we can improve over the AI-based multi-format AlphaSparse in an experiment that is limited due to its extreme computation overhead. We provide our code as an open source C++/CUDA header library, which includes both compression and multiplication kernels.
Problem

Research questions and friction points this paper is trying to address.

sparse matrix-vector multiplication
GPU acceleration
entropy coding
matrix compression
SpMVM
Innovation

Methods, ideas, or system contributions that make the work stand out.

entropy coding
sparse matrix-vector multiplication
GPU acceleration
dtANS
lossless compression
🔎 Similar Papers
No similar papers found.