SuperGCN: General and Scalable Framework for GCN Training on CPU-powered Supercomputers

📅 2024-11-25

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the inefficiencies of irregular memory access and high communication overhead in distributed full-batch GCN training on CPU-based HPC systems, this work proposes: (1) a generic aggregation operator tailored to graph-structural irregularity; (2) a novel “pre-post aggregation” paradigm that jointly optimizes pre-aggregation and post-aggregation scheduling; and (3) a synergistic communication compression mechanism integrating gradient/feature quantization with label propagation. The resulting HPC-grade CPU-distributed training framework achieves up to 6× speedup over state-of-the-art methods across multiple large-scale graph datasets, scales effectively to thousand-core CPU clusters, preserves convergence behavior and model accuracy, and significantly reduces power consumption and hardware cost compared to GPU-based alternatives.

Technology Category

Application Category

📝 Abstract

Graph Convolutional Networks (GCNs) are widely used in various domains. However, training distributed full-batch GCNs on large-scale graphs poses challenges due to inefficient memory access patterns and high communication overhead. This paper presents general and efficient aggregation operators designed for irregular memory access patterns. Additionally, we propose a pre-post-aggregation approach and a quantization with label propagation method to reduce communication costs. Combining these techniques, we develop an efficient and scalable distributed GCN training framework, emph{SuperGCN}, for CPU-powered supercomputers. Experimental results on multiple large graph datasets show that our method achieves a speedup of up to 6$ imes$ compared with the SoTA implementations, and scales to 1000s of HPC-grade CPUs, without sacrificing model convergence and accuracy. Our framework achieves performance on CPU-powered supercomputers comparable to that of GPU-powered supercomputers, with a fraction of the cost and power budget.

Problem

Research questions and friction points this paper is trying to address.

Inefficient memory access in distributed GCN training

High communication overhead in large-scale GCN training

Scalability challenges for CPU-based supercomputers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient aggregation for irregular memory access

Hierarchical scheme reduces communication costs

Communication-aware quantization enhances performance

🔎 Similar Papers

No similar papers found.