ParamSpMM: Adaptive and Efficient Sparse Matrix-Matrix Multiplication on GPUs for GNNs

πŸ“… 2026-05-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

239K/year
πŸ€– AI Summary
This work addresses the limited computational efficiency of sparse matrix-matrix multiplication (SpMM) in existing graph neural networks (GNNs), which stems from its inability to adapt to diverse input features. To overcome this challenge, the authors propose ParamSpMM, an adaptive and efficient SpMM method tailored for GNNs. ParamSpMM introduces a novel parameterized compressed sparse row (PCSR) data structure and integrates a machine learning–driven SpMM-decider module that dynamically selects the optimal execution configuration. By unifying multiple optimization strategies within a parameterized architecture, the approach automatically adapts to varying input characteristics and enables effective GPU parallel acceleration. Experimental results demonstrate that ParamSpMM achieves an average speedup of 1.92Γ— over NVIDIA cuSPARSE in GNN training, substantially enhancing SpMM computational efficiency.
πŸ“ Abstract
Fueled by the ability to mine real-world graph data, GNN applications have experienced phenomenal growth. Sparse Matrix-Matrix Multiplication (SpMM) is a critical operator in GNNs. However, existing SpMM designs for GNNs struggle to adapt to diverse input characteristics. In this paper, we first conduct a comprehensive analysis of existing SpMM optimizations, revealing their limitations through statistical and empirical evidence. Based on this analysis, we introduce ParamSpMM, a parametric approach for highly adaptive and efficient SpMM computation in GNNs. It incorporates a new data structure, the Parameterized Compressed Sparse Row (PCSR), to flexibly integrate existing optimization techniques. ParamSpMM enables the configuration of these optimization techniques according to various input characteristics. Furthermore, we complement ParamSpMM with an ML-based SpMM-decider that predicts optimal configurations based on carefully crafted input features. Our evaluations demonstrate that ParamSpMM outperforms Nvidia cuSPARSE with an average speedup of 1.92x, significantly enhancing GNN training efficiency.
Problem

Research questions and friction points this paper is trying to address.

Sparse Matrix-Matrix Multiplication
Graph Neural Networks
GPU acceleration
Input adaptivity
SpMM optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

ParamSpMM
PCSR
adaptive SpMM
GNN acceleration
ML-based configuration
πŸ”Ž Similar Papers
No similar papers found.