π€ AI Summary
This work addresses the challenge that existing neural routing methods struggle to support near-real-time deployment in real networks due to telemetry data staleness caused by communication delays. The authors formulate telemetry-aware routing as a closed-loop control problem that explicitly integrates both communication and inference latencies. They propose LOGGIA, a scalable, localized graph neural routing framework that, for the first time, explicitly models these two delay components and combines pretraining with online policy reinforcement learning to enable each router to locally predict link weights in logarithmic space. Experiments on both synthetic and real-world topologies under unseen mixed TCP/UDP traffic demonstrate that LOGGIA significantly outperforms shortest-path baselines, while other neural approaches suffer notable performance degradation when realistic delays are introduced, thereby validating the efficacy of fully distributed deployment.
π Abstract
Routing algorithms are crucial for efficient computer network operations, and in many settings they must be able to react to traffic bursts within milliseconds. Live telemetry data can provide informative signals to routing algorithms, and recent work has trained neural networks to exploit such signals for traffic-aware routing. Yet, aggregating network-wide information is subject to communication delays, and existing neural approaches either assume unrealistic delay-free global states, or restrict routers to purely local telemetry. This leaves their deployability in real-world environments unclear. We cast telemetry-aware routing as a delay-aware closed-loop control problem and introduce a framework that trains and evaluates neural routing algorithms, while explicitly modeling communication and inference delays. On top of this framework, we propose LOGGIA, a scalable graph neural routing algorithm that predicts log-space link weights from attributed topology-and-telemetry graphs. It utilizes a data-driven pre-training stage, followed by on-policy Reinforcement Learning. Across synthetic and real network topologies, and unseen mixed TCP/UDP traffic sequences, LOGGIA consistently outperforms shortest-path baselines, whereas neural baselines fail once realistic delays are enforced. Our experiments further suggest that neural routing algorithms like LOGGIA perform best when deployed fully locally, i.e., observing network states and inferring actions at every router individually, as opposed to centralized decision making.