🤖 AI Summary
This work addresses the trade-off between energy efficiency and delay violation probability in cell-free MIMO systems by proposing an optimization framework based on a constrained Markov decision process (CMDP) with virtual constraints. The approach innovatively employs an Evidence-Aware Conditional Gaussian Mixture Model (EA-CGMM) to model state transitions, effectively mitigating data sparsity and distribution shift while substantially reducing computational complexity. By integrating Proximal Policy Optimization (PPO), dual methods, and offline pretraining, the proposed method maintains the delay violation rate stably below 1%, achieves a twofold improvement in initial energy efficiency, and ultimately enhances energy efficiency by 4.7%. Furthermore, it reduces exploration steps by 50% and lowers computational overhead by a factor of 14 compared to diffusion-model-based approaches.
📝 Abstract
Cell-free multiple-input multiple-output (CF-MIMO) architecture significantly enhances wireless network performance, offering a promising solution for delay-sensitive applications. This paper investigates the resource allocation problem in CF-MIMO systems, aiming to maximize energy efficiency (EE) while satisfying delay violation rate constraint. We design a Proximal Policy Optimization (PPO) with a primal-dual method to solve it. To address the low sample efficiency and safety risks caused by cold-start of the designed safe deep reinforcement learning (DRL) method, we propose a novel offline pretraining framework based on virtual constrained Markov decision process (CMDP) modeling. The virtual CMDP consists of reward and cost prediction module, initial-state distribution module and state transition module. Notably, we propose an evidence-aware conditional Gaussian Mixture Model (EA-CGMM) inference approach to mitigate data sparsity and distribution drift issues in state transition modeling. Simulation results demonstrate the effectiveness of CMDP modeling and validate the safety and efficiency of the proposed pretraining framework. Specifically, compared with non-pretrained baseline, the agent pretrained through our proposed framework achieves twice the initial EE and maintains a low delay constraint violation rate of $1\%$, while ultimately converging to an EE that is $4.7\%$ higher with a $50\%$ reduction in exploration steps. Additionally, our proposed pretraining framework implementation exhibits comparable performance to the SOTA diffusion model-based implementation, while achieving a $14$-fold reduction in computational complexity.