🤖 AI Summary
To address the challenges of dynamic sleep control under non-stationary traffic and the limited co-optimization of energy efficiency (EE) and quality-of-service (QoS) in 5G millimeter-wave (mmWave) networks, this paper proposes MARL-DDQN, a distributed optimization framework based on multi-agent deep reinforcement learning (MARL). MARL-DDQN integrates Double DQN, a realistic base station power consumption model, 3D beamforming, and community-aware user mobility modeling, enabling low-overhead collaborative decision-making while mitigating state-action space explosion and scalability bottlenecks. Experiments demonstrate an EE of 0.60 Mbit/Joule, a 10%-ile user throughput of 8.5 Mbps, and QoS satisfaction in 95% of time slots—substantially outperforming baselines such as All On and IT-QoS-LB. The key contribution is the first application of MARL to dynamic sleep control for mmWave base stations, achieving joint EE and QoS guarantees in highly dynamic environments.
📝 Abstract
Dynamic sleep mode optimization (SMO) in millimeter-wave (mmWave) networks is essential for maximizing energy efficiency (EE) under stringent quality-of-service (QoS) constraints. However, existing optimization and reinforcement learning (RL) approaches rely on aggregated, static base station (BS) traffic models that fail to capture non-stationary traffic dynamics and suffer from large state-action spaces, limiting real-world deployment. To address these challenges, this paper proposes a multi-agent deep reinforcement learning (MARL) framework using a Double Deep Q-Network (DDQN), referred to as MARL-DDQN, for adaptive SMO in a 3D urban environment with a time-varying and community-based user equipment (UE) mobility model. Unlike conventional single-agent RL, MARL-DDQN enables scalable, distributed decision-making with minimal signaling overhead. A realistic BS power consumption model and beamforming are integrated to accurately quantify EE, while QoS is defined in terms of throughput. The method adapts SMO policies to maximize EE while mitigating inter-cell interference and ensuring throughput fairness. Simulations show that MARL-DDQN outperforms state-of-the-art strategies, including All On, iterative QoS-aware load-based (IT-QoS-LB), MARL-DDPG, and MARL-PPO, achieving up to 0.60 Mbit/Joule EE, 8.5 Mbps 10th-percentile throughput, and meeting QoS constraints 95% of the time under dynamic scenarios.