MERBIT: A GPU-Based SpMV Method for Iterative Workloads

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

This work addresses the inefficiency of repeated sparse matrix-vector multiplication (SpMV) on irregular sparse graph matrices when executed on GPUs. To tackle this challenge, the authors propose a novel approach that integrates global merge-path partitioning with local compact bit-field descriptors. This design achieves load balance while significantly improving memory coalescing efficiency for both matrix reads and result writes. The method is further enhanced by three synergistic optimization strategies specifically tailored for iterative workloads such as PageRank. Experimental evaluation across 50 large-scale irregular datasets demonstrates that the proposed solution outperforms the cuSPARSE COO format by an average of 1.27× in single precision and 1.25× in double precision, and also surpasses other state-of-the-art baselines including Ginkgo.

📝 Abstract

Sparse Matrix-Vector Multiplication (SpMV) is the cornerstone in many iterative workloads, including large-scale graph analytics and sparse iterative solvers. Accelerating SpMV on real-world graphs remains challenging due to highly irregular sparsity patterns. In this paper, we propose MERBIT, a GPU SpMV method designed for repeated SpMV on irregular, graph-like sparse matrices, with PageRank as a representative motivating workload. MERBIT combines two key ideas from existing GPU SpMV methods. At the global level, it uses merge-path partitioning to balance work over nonzeros and row boundaries. At the local level, it encodes each merge-path segment using a compact bit-field descriptor. MERBIT improves workload balance and promotes coalesced memory access for both matrix loading and output writes; moreover, three optimization strategies are incorporated to further enhance performance. Experiments on 50 large irregular datasets demonstrate that MERBIT outperforms competitive baselines, including cuSPARSE, Ginkgo, and academic approaches, achieving average speedups of 1.27 and 1.25 over cuSPARSE COO in single and double precision, respectively.

Problem

Research questions and friction points this paper is trying to address.

SpMV

irregular sparsity

iterative workloads

graph analytics

GPU acceleration

Innovation

Methods, ideas, or system contributions that make the work stand out.

SpMV

GPU acceleration

merge-path partitioning