🤖 AI Summary
This work addresses the high communication overhead in distributed graph neural network (GNN) training, primarily caused by neighbor dependencies that necessitate frequent exchange of boundary node features, creating a communication bottleneck. To mitigate this, the authors propose a communication-efficient training framework featuring a novel dynamic on-the-fly graph compression mechanism that aggregates boundary nodes into compact super-nodes to reduce data transmission. Additionally, a gradient error feedback strategy is integrated to compensate for information loss due to compression, thereby preserving model convergence and accuracy. Experimental results on four benchmark datasets demonstrate that the proposed method reduces communication volume by 40%–60%, significantly accelerates training time, and maintains accuracy comparable to that of full-precision baselines.
📝 Abstract
Distributed Graph Neural Network (GNN) training suffers from substantial communication overhead due to the inherent neighborhood dependency in graph-structured data. This neighbor explosion problem requires workers to frequently exchange boundary node features across partitions, creating a communication bottleneck that severely limits training scalability. Existing approaches rely on static graph partitioning strategies that cannot adapt to dynamic network conditions. In this paper, we propose CondenseGraph, a novel communication-efficient framework for distributed GNN training. Our key innovation is an on-the-fly graph condensation mechanism that dynamically compresses boundary node features into compact super nodes before transmission. To compensate for the information loss introduced by compression, we develop a gradient-based error feedback mechanism that maintains convergence guarantees while reducing communication volume by 40-60%. Extensive experiments on four benchmark datasets demonstrate that CondenseGraph achieves comparable accuracy to full-precision baselines while significantly reducing communication costs and training time.