🤖 AI Summary
This work addresses the inefficiency of graph neural network (GNN) inference on streaming graphs, where runtime embedding computation suffers from frequent and costly multi-hop traversals. To overcome this, the authors propose an efficient incremental computation framework that decouples GNN message passing into fine-grained, generic operators and reorders their execution to update embeddings only within affected subgraphs, thereby eliminating redundant computations. This approach is the first to support general-purpose incremental embedding updates under complex message-passing patterns while preserving model accuracy and substantially reducing computational overhead. Furthermore, it integrates GPU-CPU cooperative memory management and communication scheduling to handle large-scale historical embeddings. Experiments across diverse graph sizes and GNN architectures demonstrate a 64%–99% reduction in computation volume and speedups ranging from 1.7× to 145.8× over existing methods.
📝 Abstract
Graph Neural Network (GNN) on streaming graphs has gained increasing popularity. However, its practical deployment remains challenging, as the inference process relies on Runtime Embedding Computation (RTEC) to capture recent graph changes. This process incurs heavyweight multi-hop graph traversal overhead, which significantly undermines computation efficiency. We observe that the intermediate results for large portions of the graph remain unchanged during graph evolution, and thus redundant computations can be effectively eliminated through carefully designed incremental methods. In this work, we propose an efficient framework for incrementalizing RTEC on streaming graphs.The key idea is to decouple GNN computation into a set of generalized, fine-grained operators and safely reorder them, transforming the expensive full-neighbor GNN computation into a more efficient form over the affected subgraph. With this design, our framework preserves the semantics and accuracy of the original full-neighbor computation while supporting a wide range of GNN models with complex message-passing patterns. To further scale to graphs with massive historical results, we develop a GPU-CPU co-processing system that offloads embeddings to CPU memory with communication-optimized scheduling. Experiments across diverse graph sizes and GNN models show that our method reduces computation by 64%-99% and achieves 1.7x-145.8x speedups over existing solutions.