AES-SpMM: Balancing Accuracy and Speed by Adaptive Edge Sampling Strategy to Accelerate SpMM in GNNs

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the accuracy–efficiency trade-off and data loading bottlenecks in Graph Neural Network (GNN) sparse matrix–matrix multiplication (SpMM) caused by rigid edge sampling strategies, this paper proposes an adaptive edge-sampling SpMM kernel. Our method introduces a row-wise adaptive sampling mechanism that dynamically selects sampling strategies based on the number of non-zeros per row and GPU shared memory width. It is the first to jointly integrate low-bit feature quantization with adaptive sampling, enabling fused dequantization and computation. The approach comprises adaptive sparse graph compression, shared-memory-aware sampling, and a custom CUDA kernel. Experiments show that our method achieves up to 25.87× and 23.01× speedup over cuSPARSE and GE-SpMM, respectively, with <0.3% accuracy degradation. Compared to ES-SpMM, it delivers 1.31× average speedup and reduces accuracy loss by 3.4%. Data loading time is reduced by 50.91%–70.51%.

Technology Category

Application Category

📝 Abstract

Coordinating the design of sampling and sparse-dense matrix multiplication (SpMM) is crucial for accelerating graph neural networks (GNNs). However, due to irrational sampling strategies, existing methods face a trade-off between accuracy and speed. Moreover, as computational optimizations progress, data loading has gradually become the primary bottleneck in GNN inference. To address these issues, we propose AES-SpMM, an adaptive edge sampling SpMM kernel. It considers the relationship between the number of non-zero elements in each matrix row and the shared memory width. The edge sampling scheme is adaptively selected according to the different situations of each row. AES-SpMM reduces the graph size through adaptive edge sampling to fit the GPU's shared memory, lowering the computational cost and enhancing data locality, thus balancing the accuracy and speed of GNN inference. Additionally, we introduce a quantization-based AES-SpMM, which applies quantization and dequantization to feature data in GNNs. This approach significantly reduces data loading time while keeping accuracy loss negligible. We evaluated AES-SpMM with common GNN models and datasets. The results show that AES-SpMM outperforms both the cuSPARSE SpMM kernel and GE-SpMM by up to 25.87 times and 23.01 times, respectively, with less than 1% accuracy loss. Compared to ES-SpMM, it reduces accuracy loss by 3.4% on average , achieving a 1.31 times speedup. Compared to AES-SpMM, quantization-based AES-SpMM has a maximum accuracy loss of 0.3% and feature data loading time overhead is reduced by 50.91%-70.51%.

Problem

Research questions and friction points this paper is trying to address.

Balancing accuracy and speed in GNNs via adaptive edge sampling

Reducing data loading bottleneck in GNN inference through quantization

Optimizing SpMM kernel for GPU shared memory efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive edge sampling for SpMM acceleration

Quantization reduces data loading time significantly

Balances accuracy and speed in GNNs

🔎 Similar Papers

Survey on Characterizing and Understanding GNNs from a Computer Architecture Perspective