Sparsity-Aware Communication for Distributed Graph Neural Network Training

📅 2024-08-12

🏛️ International Conference on Parallel Processing

📈 Citations: 1

✨ Influential: 0

career value

218K/year

🤖 AI Summary

In distributed Graph Neural Network (GNN) training, the SpMM (Sparse Matrix-Matrix Multiplication) operation incurs substantial redundant communication by ignoring graph sparsity, severely limiting scalability. To address this, we propose the first sparsity-aware, triple-coordinated optimization framework: (i) an on-demand communication mechanism that transmits only data from active neighbors; (ii) graph-structure-driven sparse matrix reordering to enhance memory and communication locality; and (iii) a load-balanced, customized 1.5D parallel partitioning scheme. Our approach achieves near-zero SpMM communication volume—the first such result—while delivering a 14× end-to-end training speedup on 256 GPUs. It significantly outperforms existing communication-agnostic frameworks, establishing a new “communication-free” paradigm for large-scale GNN training.

Technology Category

Application Category

📝 Abstract

Graph Neural Networks (GNNs) are a computationally efficient method to learn embeddings and classifications on graph data. However, GNN training has low computational intensity, making communication costs the bottleneck for scalability. Sparse-matrix dense-matrix multiplication (SpMM) is the core computational operation in full-graph training of GNNs. Previous work parallelizing this operation focused on sparsity-oblivious algorithms, where matrix elements are communicated regardless of the sparsity pattern. This leads to a predictable communication pattern that can be overlapped with computation and enables the use of collective communication operations at the expense of wasting significant bandwidth by communicating unnecessary data. We develop sparsity-aware algorithms that tackle the communication bottlenecks in GNN training with three novel approaches. First, we communicate only the necessary matrix elements. Second, we utilize a graph partitioning model to reorder the matrix and drastically reduce the amount of communicated elements. Finally, we address the high load imbalance in communication with a tailored partitioning model, which minimizes both the total communication volume and the maximum sending volume. We further couple these sparsity-exploiting approaches with a communication-avoiding approach (1.5D parallel SpMM) in which submatrices are replicated to reduce communication. We explore the tradeoffs of these combined optimizations and show up to 14X improvement on 256 GPUs and on some instances reducing communication to almost zero resulting in a communication-free parallel training relative to a popular GNN framework based on communication-oblivious SpMM.

Problem

Research questions and friction points this paper is trying to address.

Addresses communication bottlenecks in GNN training

Reduces unnecessary data transfer via sparsity-aware algorithms

Optimizes load imbalance and partitioning in distributed SpMM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Communicate only necessary matrix elements

Use graph partitioning to reduce communication

Balance load with tailored partitioning model

🔎 Similar Papers

No similar papers found.