Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address QoS fluctuations in 5G ultra-dense cellular networks caused by high-speed user mobility and uncertain channel state information, this paper proposes an end-to-end deep reinforcement learning (DRL) framework for load balancing. Methodologically, we design a multi-objective reward function jointly optimizing throughput, latency, jitter, packet loss rate, fairness, and handover frequency; employ the Proximal Policy Optimization (PPO) algorithm within an Actor-Critic architecture; and model Gaussian-Markov mobility and stochastic measurement noise in a lightweight Python-based simulator. Truncated policy updates and generalized advantage estimation are incorporated to enhance training stability and generalization. Experimental results demonstrate rapid convergence within 500+ training episodes under high traffic load, with significant improvements over state-of-the-art baselines—ReBuHa, A3, and CDQL—across all key metrics. The proposed framework achieves autonomous, robust, and efficient load balancing in highly dynamic wireless environments.

Technology Category

Application Category

📝 Abstract

Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. We present a deep reinforcement learning framework based on Proximal Policy Optimization (PPO) for autonomous, QoS-aware load balancing implemented end-to-end in a lightweight, pure-Python simulation environment. The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values to steer user-cell associations. A multi-objective reward captures key performance indicators (aggregate throughput, latency, jitter, packet loss rate, Jain's fairness index, and handover count), so the learned policy explicitly balances efficiency and stability under user mobility and noisy observations. The PPO agent uses an actor-critic neural network trained from trajectories generated by the Python simulator with configurable mobility (e.g., Gauss-Markov) and stochastic measurement noise. Across 500+ training episodes and stress tests with increasing user density, the PPO policy consistently improves KPI trends (higher throughput and fairness, lower delay, jitter, packet loss, and handovers) and exhibits rapid, stable convergence. Comparative evaluations show that PPO outperforms rule-based ReBuHa and A3 as well as the learning-based CDQL baseline across all KPIs while maintaining smoother learning dynamics and stronger generalization as load increases. These results indicate that PPO's clipped policy updates and advantage-based training yield robust, deployable control for next-generation RAN load balancing using an entirely Python-based toolchain.

Problem

Research questions and friction points this paper is trying to address.

Optimizing QoS-aware load balancing in 5G networks

Addressing user mobility and observation uncertainty challenges

Improving network efficiency and stability using deep reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses deep reinforcement learning for load balancing

Applies Proximal Policy Optimization with actor-critic network

Adjusts Cell Individual Offset values autonomously

🔎 Similar Papers

No similar papers found.