DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

📅 2024-05-08

🏛️ Proceedings of the ACM on Management of Data

📈 Citations: 1

✨ Influential: 0

career value

223K/year

🤖 AI Summary

To address I/O amplification and accuracy degradation in out-of-GPU-memory GNN training on ultra-large-scale graphs under memory constraints, this paper proposes DiskGNN. It introduces the first offline graph sampling mechanism to decouple sampling from computation; designs a four-level heterogeneous feature storage hierarchy (GPU memory, host RAM, SSD, cold storage), augmented with disk-layout optimization for feature contiguity and batched feature packing to enable efficient hot-feature caching; and establishes a deep compute-I/O pipelined scheduler. Compared to state-of-the-art systems, DiskGNN achieves an 8.2× speedup in training throughput while strictly preserving model accuracy. The system is open-sourced and supports end-to-end, out-of-GPU-memory GNN training on billion-scale graphs.

Technology Category

Application Category

📝 Abstract

Graph neural networks (GNNs) are models specialized for graph data and widely used in applications. To train GNNs on large graphs that exceed CPU memory, several systems have been designed to store data on disk and conduct out-of-core processing. However, these systems suffer from either read amplification when conducting random reads for node features that are smaller than a disk page, or degraded model accuracy by treating the graph as disconnected partitions. To close this gap, we build DiskGNN for high I/O efficiency and fast training without model accuracy degradation. The key technique is offline sampling , which decouples graph sampling from model computation . In particular, by conducting graph sampling beforehand for multiple mini-batches, DiskGNN acquires the node features that will be accessed during model computation and conducts pre-processing to pack the node features of each mini-batch contiguously on disk to avoid read amplification for computation. Given the feature access information acquired by offline sampling, DiskGNN also adopts designs including four-level feature store to fully utilize the memory hierarchy of GPU and CPU to cache hot node features and reduce disk access, batched packing to accelerate feature packing during pre-processing, and pipelined training to overlap disk access with other operations. We compare DiskGNN with state-of-the-art out-of-core GNN training systems. The results show that DiskGNN has more than 8x speedup over existing systems while matching their best model accuracy. DiskGNN is open-source at https://github.com/Liu-rj/DiskGNN.

Problem

Research questions and friction points this paper is trying to address.

Achieving efficient I/O for large graph training without accuracy loss

Eliminating read amplification in out-of-core GNN feature access

Overcoming graph partition limitations while maintaining model accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline sampling decouples graph sampling from computation

Four-level feature store caches features to reduce disk access

Batched packing and pipelined training accelerate feature processing

🔎 Similar Papers

SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training