Demystifying Distributed Training of Graph Neural Networks for Link Prediction

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Distributed Graph Neural Networks (GNNs) for link prediction suffer from degraded accuracy due to information loss from graph partitioning and biased negative sampling, while global-graph-sharing approaches incur prohibitive communication overhead. To address these challenges, we propose SpLPG—a structure-aware graph sparsification framework that prunes inter-partition edges while preserving critical topological structures, jointly optimizing graph partitioning and negative sampling strategies. SpLPG operates without global graph synchronization, enabling accurate link prediction with significantly reduced communication. Experiments on multiple real-world graph datasets demonstrate that SpLPG reduces communication volume by approximately 80% on average, with less than a 1% drop in AUC/ROC performance—substantially outperforming existing distributed GNN-based link prediction methods.

Technology Category

Application Category

📝 Abstract
Graph neural networks (GNNs) are powerful tools for solving graph-related problems. Distributed GNN frameworks and systems enhance the scalability of GNNs and accelerate model training, yet most are optimized for node classification. Their performance on link prediction remains underexplored. This paper demystifies distributed training of GNNs for link prediction by investigating the issue of performance degradation when each worker trains a GNN on its assigned partitioned subgraph without having access to the entire graph. We discover that the main sources of the issue come from not only the information loss caused by graph partitioning but also the ways of drawing negative samples during model training. While sharing the complete graph information with each worker resolves the issue and preserves link prediction accuracy, it incurs a high communication cost. We propose SpLPG, which effectively leverages graph sparsification to mitigate the issue of performance degradation at a reduced communication cost. Experiment results on several public real-world datasets demonstrate the effectiveness of SpLPG, which reduces the communication overhead by up to about 80% while mostly preserving link prediction accuracy.
Problem

Research questions and friction points this paper is trying to address.

Investigates performance degradation in distributed GNN training for link prediction
Identifies graph partitioning and negative sampling as key issues
Proposes SpLPG to reduce communication cost while maintaining accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages graph sparsification for performance
Reduces communication cost significantly
Preserves link prediction accuracy effectively
🔎 Similar Papers
No similar papers found.
X
Xin Huang
Department of Computer Science, Texas State University
Chul-Ho Lee
Chul-Ho Lee
Computer Science, Texas State University
Graph MiningMachine LearningNetworkingSystems