🤖 AI Summary
This work addresses the Routing and Wavelength Assignment with Flexible-Rate Transceivers (RWA-LR) in fixed-grid optical networks, focusing on the critical scenario of multiplexing new traffic onto existing lightpaths under long-horizon resource allocation—where reinforcement learning (RL) policies suffer from training instability and sample inefficiency.
Method: We innovatively integrate Graph Attention Networks (GATs) into an Actor-Critic RL framework to enable topology-aware modeling of both policy and value functions. Additionally, we propose a hop-count–based heuristic for candidate path generation, demonstrating that topological distance is more decisive than physical link length for RWA-LR decisions.
Contribution/Results: Evaluated within an open-source, reproducible training pipeline, our approach achieves a 2.5% throughput gain (+17.4 Tbps) over the best RL baseline and outperforms the strongest heuristic by 1.2% (+8.5 Tbps), empirically validating the efficacy of graph-structured representation learning for RWA-LR optimization.
📝 Abstract
Many works have investigated reinforcement learning (RL) for routing and spectrum assignment on flex-grid networks but only one work to date has examined RL for fixed-grid with flex-rate transponders, despite production systems using this paradigm. Flex-rate transponders allow existing lightpaths to accommodate new services, a task we term routing and wavelength assignment with lightpath reuse (RWA-LR). We re-examine this problem and present a thorough benchmarking of heuristic algorithms for RWA-LR, which are shown to have 6% increased throughput when candidate paths are ordered by number of hops, rather than total length. We train an RL agent for RWA-LR with graph attention networks for the policy and value functions to exploit the graph-structured data. We provide details of our methodology and open source all of our code for reproduction. We outperform the previous state-of-the-art RL approach by 2.5% (17.4 Tbps mean additional throughput) and the best heuristic by 1.2% (8.5 Tbps mean additional throughput). This marginal gain highlights the difficulty in learning effective RL policies on long horizon resource allocation tasks.