๐ค AI Summary
Existing sparse matrix optimization methods often coarsely classify structured sparsity (e.g., clustered non-zeros) as either fully dense or fully sparse, leading to redundant zero computations in fixed-block formats (e.g., BCSR) or substantial overhead in variable-block approaches due to unknown loop bounds at compile time. This work proposes a region-aware, multi-stage compilation framework that automatically identifies high-benefit variable-size blocks, statically infers dynamic loop bounds, and generates customized vectorized codeโbalancing efficiency and adaptability. Key techniques include sparse partition analysis, domain-specific code generation, loop vectorization, and compile-time scheduling specialization. Evaluated on the SuiteSparse dataset, our approach achieves 1.07ร, 2.73ร, and 1.9ร higher single-threaded SpMV performance over Intel MKL, CSR5, and Partial Strided Codelets, respectively; parallel scalability further enhances throughput.
๐ Abstract
Structured sparsity, like regions of non-zero elements in sparse matrices, can offer optimization opportunities often overlooked by existing solutions that treat matrices as entirely dense or sparse. Block-based approaches, such as BCSR, partially address this issue by choosing between fixed-size blocks which results in wasted computation on zero elements. On the other hand, variable-sized blocks introduce overheads due to variable loop bounds unknown at compile time. We present SABLE, a novel staging framework that achieves the best of both approaches by generating region-specific code tailored for variable-sized blocks. SABLE partitions the matrix to identify profitable blocks and specializes generated code for vectorization. We evaluate SABLE on the SpMV kernel using the SuiteSparse collection. SABLE achieves a geomean of $1.07$, $2.73$ and $1.9$ speedup over the state of the art systems: Intel MKL, CSR5 and Partially-Strided Codelets, respectively, single threaded and even more when parallelized.