DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

📅 2024-05-08
🏛️ Proceedings of the ACM on Management of Data
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address I/O amplification and accuracy degradation in out-of-GPU-memory GNN training on ultra-large-scale graphs under memory constraints, this paper proposes DiskGNN. It introduces the first offline graph sampling mechanism to decouple sampling from computation; designs a four-level heterogeneous feature storage hierarchy (GPU memory, host RAM, SSD, cold storage), augmented with disk-layout optimization for feature contiguity and batched feature packing to enable efficient hot-feature caching; and establishes a deep compute-I/O pipelined scheduler. Compared to state-of-the-art systems, DiskGNN achieves an 8.2× speedup in training throughput while strictly preserving model accuracy. The system is open-sourced and supports end-to-end, out-of-GPU-memory GNN training on billion-scale graphs.

Technology Category

Application Category

📝 Abstract
Graph neural networks (GNNs) are models specialized for graph data and widely used in applications. To train GNNs on large graphs that exceed CPU memory, several systems have been designed to store data on disk and conduct out-of-core processing. However, these systems suffer from either read amplification when conducting random reads for node features that are smaller than a disk page, or degraded model accuracy by treating the graph as disconnected partitions. To close this gap, we build DiskGNN for high I/O efficiency and fast training without model accuracy degradation. The key technique is offline sampling , which decouples graph sampling from model computation . In particular, by conducting graph sampling beforehand for multiple mini-batches, DiskGNN acquires the node features that will be accessed during model computation and conducts pre-processing to pack the node features of each mini-batch contiguously on disk to avoid read amplification for computation. Given the feature access information acquired by offline sampling, DiskGNN also adopts designs including four-level feature store to fully utilize the memory hierarchy of GPU and CPU to cache hot node features and reduce disk access, batched packing to accelerate feature packing during pre-processing, and pipelined training to overlap disk access with other operations. We compare DiskGNN with state-of-the-art out-of-core GNN training systems. The results show that DiskGNN has more than 8x speedup over existing systems while matching their best model accuracy. DiskGNN is open-source at https://github.com/Liu-rj/DiskGNN.
Problem

Research questions and friction points this paper is trying to address.

Achieving efficient I/O for large graph training without accuracy loss
Eliminating read amplification in out-of-core GNN feature access
Overcoming graph partition limitations while maintaining model accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline sampling decouples graph sampling from computation
Four-level feature store caches features to reduce disk access
Batched packing and pipelined training accelerate feature processing
Renjie Liu
Renjie Liu
Southern University of Science and Technology
Machine Learning SystemsGNN Systems
Yichuan Wang
Yichuan Wang
University of Sheffield
Digital marketingAnalytics & AIDigital health
X
Xiao Yan
Centre for Perceptual and Interactive Intelligence
Zhenkun Cai
Zhenkun Cai
Amazon Web Services
Large-scale machine learning system
M
Minjie Wang
AWS Shanghai AI Lab
H
Haitian Jiang
New York University
B
Bo Tang
Southern University of Science and Technology
J
Jinyang Li
New York University