🤖 AI Summary
To address the scheduling challenge in O-RAN network slicing under dynamic Service-Level Agreements (SLAs), particularly time-varying end-to-end latency constraints, this paper proposes an adaptive deep reinforcement learning (DRL) architecture. The method introduces a novel retraining-free mechanism enabling online updates of heterogeneous SLAs, integrating state-aware encoding with latency-sensitive reward shaping. Evaluated in the OpenRAN Gym simulation environment, it achieves an 8.3× and 14.4× reduction in SLA violation rate compared to DQN and Q-learning, respectively, while decreasing resource consumption by 70% and 40%. Crucially, it incurs zero retraining overhead. The core contribution lies in overcoming the limitations of static policies: the architecture enables real-time, reliable, and efficient scheduling under dynamically evolving SLA thresholds—marking a significant advance toward adaptive, SLA-aware RAN orchestration.
📝 Abstract
The Open Radio Access Network (Open RAN) paradigm, and its reference architecture proposed by the O-RAN Alliance, is paving the way toward open, interoperable, observable and truly intelligent cellular networks. Crucial to this evolution is Machine Learning (ML), which will play a pivotal role by providing the necessary tools to realize the vision of self-organizing O-RAN systems. However, to be actionable, ML algorithms need to demonstrate high reliability, effectiveness in delivering high performance, and the ability to adapt to varying network conditions, traffic demands and performance requirements. To address these challenges, in this paper we propose a novel Deep Reinforcement Learning (DRL) agent design for O-RAN applications that can learn control policies under varying Service Level Agreement (SLAs) with heterogeneous minimum performance requirements. We focus on the case of RAN slicing and SLAs specifying maximum tolerable end-to-end latency levels. We use the OpenRAN Gym open-source environment to train a DRL agent that can adapt to varying SLAs and compare it against the state-of-the-art. We show that our agent maintains a low SLA violation rate that is 8.3× and 14.4× lower than approaches based on Deep Q-Learning (DQN) and Q-Learning, while consuming respectively 0.3 × and 0.6 × less resources without the need for retraining.