A Nonlinear Hash-based Optimization Method for SpMV on GPUs

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address performance bottlenecks of sparse matrix–vector multiplication (SpMV) on GPUs under large-scale, highly sparse workloads, this paper proposes a co-optimization framework combining nonlinear hashing-based matrix reordering with 2D blocking. We pioneer the use of nonlinear hash mapping for structural reordering and introduce a lightweight Hash-based Partition (HBP) storage format that jointly exploits hash-induced clustering and 2D block locality. Furthermore, we design a contention-aware parallel load-balancing mechanism to significantly reduce preprocessing overhead. Experimental results show that our preprocessing phase achieves 3.53× and 3.67× speedup over conventional sorting and Regu2D dynamic programming, respectively. In SpMV computation, HBP delivers up to 3.32× and 3.01× acceleration over CSR on Jetson AGX Orin and RTX 4090, respectively. The method thus bridges the gap between efficient preprocessing and high-throughput GPU SpMV execution for extreme sparsity regimes.

Technology Category

Application Category

📝 Abstract

Sparse matrix-vector multiplication (SpMV) is a fundamental operation with a wide range of applications in scientific computing and artificial intelligence. However, the large scale and sparsity of sparse matrix often make it a performance bottleneck. In this paper, we highlight the effectiveness of hash-based techniques in optimizing sparse matrix reordering, introducing the Hash-based Partition (HBP) format, a lightweight SpMV approach. HBP retains the performance benefits of the 2D-partitioning method while leveraging the hash transformation's ability to group similar elements, thereby accelerating the pre-processing phase of sparse matrix reordering. Additionally, we achieve parallel load balancing across matrix blocks through a competitive method. Our experiments, conducted on both Nvidia Jetson AGX Orin and Nvidia RTX 4090, show that in the pre-processing step, our method offers an average speedup of 3.53 times compared to the sorting approach and 3.67 times compared to the dynamic programming method employed in Regu2D. Furthermore, in SpMV, our method achieves a maximum speedup of 3.32 times on Orin and 3.01 times on RTX4090 against the CSR format in sparse matrices from the University of Florida Sparse Matrix Collection.

Problem

Research questions and friction points this paper is trying to address.

Optimizing SpMV performance on GPUs for sparse matrices

Introducing Hash-based Partition format for efficient matrix reordering

Achieving parallel load balancing across matrix blocks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hash-based Partition (HBP) format for SpMV

Parallel load balancing across matrix blocks

Hash transformation groups similar elements

🔎 Similar Papers

No similar papers found.