🤖 AI Summary
To address memory bottlenecks, information loss, and redundant computation arising from subgraph sampling in distributed GNN inference on ultra-large-scale graphs (500M nodes / 22.4B edges), this work proposes the first sampling-free, full-graph-aware distributed GNN inference paradigm. Our method introduces a GNN-specific abstract programming interface tightly co-designed with a distributed just-in-time (JIT) compiler; integrates memory-aware scheduling with cross-node tensor pipelining to enable end-to-end full-graph inference optimization. Evaluated on an industrial-scale graph (500M nodes / 2.24B edges), our system achieves a 27.4× speedup in inference latency and reduces GPU memory consumption by 63% compared to state-of-the-art baselines. It is the first to enable real-time, high-accuracy full-graph GNN inference on such massive graphs.
📝 Abstract
Graph neural networks (GNNs) have delivered remarkable results in various fields. However, the rapid increase in the scale of graph data has introduced significant performance bottlenecks for GNN inference. Both computational complexity and memory usage have risen dramatically, with memory becoming a critical limitation. Although graph sampling-based subgraph learning methods can help mitigate computational and memory demands, they come with drawbacks such as information loss and high redundant computation among subgraphs. This paper introduces an innovative processing paradgim for distributed graph learning that abstracts GNNs with a new set of programming interfaces and leverages Just-In-Time (JIT) compilation technology to its full potential. This paradigm enables GNNs to highly exploit the computational resources of distributed clusters by eliminating the drawbacks of subgraph learning methods, leading to a more efficient inference process. Our experimental results demonstrate that on industry-scale graphs of up to extbf{500 million nodes and 22.4 billion edges}, our method can produce a performance boost of up to extbf{27.4 times}.