ATLAS: Efficient Out-of-Core Inference for Billion-Scale Graph Neural Networks

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the severe I/O amplification, irregular memory access, and memory pressure challenges in single-machine, out-of-core full-graph inference for billion-scale graph neural networks (GNNs). To overcome these limitations, the authors propose ATLAS, a novel framework featuring a broadcast-based execution model tailored for GNN inference. ATLAS integrates a hierarchical memory-disk architecture, a minimal-pending-message eviction policy, graph node reordering, and a GPU-accelerated pipelined execution to enable single-pass, sequential streaming reads of graph data. Evaluated on an ultra-large graph with 4 billion edges and 550 GiB of node features, ATLAS achieves 12–30× speedup over the best existing out-of-core baselines while incurring no more than 5% performance degradation when all features fit in memory. This marks the first system to effectively support efficient out-of-core inference, breaking the prior limitation of existing approaches that focus exclusively on training optimization.
📝 Abstract
Graph Neural Network (GNN) inference on billion-scale graphs is critical for domains like fintech and recommendation systems. Full-graph inference on these large graphs can be challenging due to high communication costs in distributed settings and high I/O costs in disk-backed Out-of-Core (OOC) settings. Existing OOC systems, operating across disk and memory, primarily focus on GNN training and perform poorly for full-graph inference due to massive read amplification, irregular I/O, and memory pressure. We present ATLAS, a disk-based GNN inference framework that enables efficient full-graph, layer-wise inference on graphs whose topologies, features and intermediate embeddings exceed the available memory on single machines. ATLAS replaces gather-based execution with a broadcast-based model that enables sequential, single-pass streaming reads of features and embeddings per layer. A tiered memory-disk hierarchy with minimum-pending-message eviction, graph reordering and a GPU-accelerated pipeline sustains high throughput within $128$ GiB RAM and $2$ TiB SSD. Across out-of-core graphs with up to $4$B edges and $550$ GiB features and multiple GNN architectures, ATLAS improves end-to-end inference time by $\approx12$--$30\times$ over State-of-the-Art (SOTA) OOC baselines on a single workstation, while remaining within $\approx5\%$ when features fit in memory.
Problem

Research questions and friction points this paper is trying to address.

Graph Neural Networks
Out-of-Core
Billion-scale Graphs
Full-graph Inference
I/O Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Out-of-Core
Graph Neural Networks
Broadcast-based Inference
I/O Optimization
GPU-accelerated Pipeline