🤖 AI Summary
To address dynamic load balancing for GBR and BE traffic under QoS constraints and resource limitations in multi-band O-RAN, this paper formulates the problem as a graph-structured Markov decision process. Methodologically, it deeply embeds QoS metrics into both graph-based state representation and reward design, and proposes a novel graph reinforcement learning (GRL) policy that is node-order invariant, scale-adaptive, and capable of modeling spatial dependencies—integrating GNNs, DQN, dueling networks, and off-policy learning. Experimental results demonstrate a 53% reduction in QoS violation rate and a fourfold improvement in the 5th-percentile throughput for BE traffic, significantly outperforming conventional baselines. The core contribution is the first QoS-driven, graph-structured load balancing framework specifically designed for multi-band O-RAN deployments.
📝 Abstract
Next-generation wireless cellular networks are expected to provide unparalleled Quality-of-Service (QoS) for emerging wireless applications, necessitating strict performance guarantees, e.g., in terms of link-level data rates. A critical challenge in meeting these QoS requirements is the prevention of cell congestion, which involves balancing the load to ensure sufficient radio resources are available for each cell to serve its designated User Equipments (UEs). In this work, a novel QoS-aware Load Balancing (LB) approach is developed to optimize the performance of Guaranteed Bit Rate (GBR) and Best Effort (BE) traffic in a multi-band Open Radio Access Network (O-RAN) under QoS and resource constraints. The proposed solution builds on Graph Reinforcement Learning (GRL), a powerful framework at the intersection of Graph Neural Network (GNN) and RL. The QoS-aware LB is modeled as a Markov Decision Process, with states represented as graphs. QoS consideration are integrated into both state representations and reward signal design. The LB agent is then trained using an off-policy dueling Deep Q Network (DQN) that leverages a GNN-based architecture. This design ensures the LB policy is invariant to the ordering of nodes (UE or cell), flexible in handling various network sizes, and capable of accounting for spatial node dependencies in LB decisions. Performance of the GRL-based solution is compared with two baseline methods. Results show substantial performance gains, including a $53%$ reduction in QoS violations and a fourfold increase in the 5th percentile rate for BE traffic.