StreamTGN: A GPU-Efficient Serving System for Streaming Temporal Graph Neural Networks

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the severe computational redundancy in existing temporal graph neural network (GNN) inference systems, which trigger full-graph embedding updates upon the arrival of new edges. The paper proposes the first efficient inference system tailored for streaming temporal GNNs, leveraging the locality of graph updates to refresh only affected nodes. Key innovations include a dirty-label propagation mechanism for identifying impacted nodes, a drift-aware adaptive reconstruction scheduler, and a batching strategy that relaxes strict event-ordering constraints. Built upon a GPU-resident node memory and streaming processing architecture, the system achieves 4.5–739× speedup for TGN inference and up to 4207× for TGAT across eight temporal graphs, with end-to-end training-inference co-optimization yielding up to 24× acceleration—all without sacrificing model accuracy.

Technology Category

Application Category

📝 Abstract

Temporal Graph Neural Networks (TGNs) achieve state-of-the-art performance on dynamic graph tasks, yet existing systems focus exclusively on accelerating training -- at inference time, every new edge triggers $O(|V|)$ embedding updates even though only a small fraction of nodes are affected. We present \textbf{StreamTGN}, the first streaming TGN inference system exploiting the inherent locality of temporal graph updates: in an $L$-layer TGN, a new edge affects only nodes within $L$ hops of the endpoints, typically less than 0.2\% on million-node graphs. StreamTGN maintains persistent GPU-resident node memory and uses dirty-flag propagation to identify the affected set $\mathcal{A}$, reducing per-batch complexity from $O(|V|)$ to $O(|\mathcal{A}|)$ with zero accuracy loss. Drift-aware adaptive rebuild scheduling and batched streaming with relaxed ordering further maximize throughput. Experiments on eight temporal graphs (2K--2.6M nodes) show 4.5$\times$--739$\times$ speedup for TGN and up to 4,207$\times$ for TGAT, with identical accuracy. StreamTGN is orthogonal to training optimizations: combining SWIFT with StreamTGN yields 24$\times$ end-to-end speedup across three architectures (TGN, TGAT, DySAT).

Problem

Research questions and friction points this paper is trying to address.

Temporal Graph Neural Networks

streaming inference

dynamic graphs

GPU efficiency

embedding updates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Graph Neural Networks

Streaming Inference

Locality-aware Update