Distributed Graph Neural Network Inference With Just-In-Time Compilation For Industry-Scale Graphs

📅 2025-03-08

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address memory bottlenecks, information loss, and redundant computation arising from subgraph sampling in distributed GNN inference on ultra-large-scale graphs (500M nodes / 22.4B edges), this work proposes the first sampling-free, full-graph-aware distributed GNN inference paradigm. Our method introduces a GNN-specific abstract programming interface tightly co-designed with a distributed just-in-time (JIT) compiler; integrates memory-aware scheduling with cross-node tensor pipelining to enable end-to-end full-graph inference optimization. Evaluated on an industrial-scale graph (500M nodes / 2.24B edges), our system achieves a 27.4× speedup in inference latency and reduces GPU memory consumption by 63% compared to state-of-the-art baselines. It is the first to enable real-time, high-accuracy full-graph GNN inference on such massive graphs.

Technology Category

Application Category

📝 Abstract

Graph neural networks (GNNs) have delivered remarkable results in various fields. However, the rapid increase in the scale of graph data has introduced significant performance bottlenecks for GNN inference. Both computational complexity and memory usage have risen dramatically, with memory becoming a critical limitation. Although graph sampling-based subgraph learning methods can help mitigate computational and memory demands, they come with drawbacks such as information loss and high redundant computation among subgraphs. This paper introduces an innovative processing paradgim for distributed graph learning that abstracts GNNs with a new set of programming interfaces and leverages Just-In-Time (JIT) compilation technology to its full potential. This paradigm enables GNNs to highly exploit the computational resources of distributed clusters by eliminating the drawbacks of subgraph learning methods, leading to a more efficient inference process. Our experimental results demonstrate that on industry-scale graphs of up to extbf{500 million nodes and 22.4 billion edges}, our method can produce a performance boost of up to extbf{27.4 times}.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance bottlenecks in GNN inference for large-scale graphs.

Reduces computational complexity and memory usage in distributed graph learning.

Eliminates information loss and redundant computation in subgraph learning methods.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed GNN inference with JIT compilation

New programming interfaces for GNN abstraction

Efficient resource use in large-scale graph processing

🔎 Similar Papers

No similar papers found.