🤖 AI Summary
To address high communication overhead and GPU load imbalance in full-batch Graph Neural Network (GNN) training on single-server multi-GPU systems, this paper proposes a joint caching and resource-aware graph partitioning framework. Our method innovatively integrates an adaptive feature caching mechanism with a dynamic graph partitioning strategy tailored to GPU heterogeneity, jointly optimizing CPU-GPU memory hierarchy utilization and computational resource allocation. It achieves co-optimization across three dimensions: subgraph size, feature reuse, and communication granularity. Extensive experiments on multiple large-scale graph datasets demonstrate that our approach reduces total communication volume by up to 96% and accelerates end-to-end training by up to 12.7× over state-of-the-art methods. This significantly improves scalability and hardware utilization of full-batch GNN training.
📝 Abstract
Graph Neural Networks (GNNs) have shown remarkable capabilities in processing graph-structured data prevalent in various real-world applications. However, the scalability of full-batch GNN training becomes severely limited by high communication overhead and load imbalance in distributed environments. In this paper, we present CaPGNN, a novel framework for efficient parallel full-batch GNN training on single-server with multi-GPU, designed specifically to reduce redundant inter-GPU communication and balance computational workloads. We propose a joint adaptive caching algorithm that leverages both CPU and GPU memory to significantly reduce the repetitive transmission of vertex features across partitions. Additionally, we introduce a resource-aware graph partitioning algorithm that adjusts subgraph sizes dynamically according to the heterogeneous computational and communication capacities of GPUs. Extensive experiments on large-scale benchmark datasets demonstrate that CaPGNN effectively reduces communication costs by up to 96% and accelerates GNN training by up to 12.7 times compared to state-of-the-art approaches. Our results highlight the potential of adaptive caching and resource-aware partitioning to facilitate scalable, efficient, and practical deployment of full-batch GNN training in distributed computing environments.