🤖 AI Summary
To address the energy efficiency bottleneck in cell-free massive MIMO systems, this paper pioneers the integration of stacked intelligent metasurfaces (SIMs) into the architecture and jointly optimizes access point power allocation and SIM phase responses to maximize sum spectral efficiency. We propose NVR-MAPPO, a novel multi-agent reinforcement learning algorithm that incorporates a noise-value regularization mechanism and recurrent neural network–based policies, enhancing exploration diversity and convergence robustness under a centralized training–decentralized execution framework. Compared with baseline methods, the proposed approach achieves significant gains in sum spectral efficiency across diverse user distributions and channel conditions, while simultaneously improving energy efficiency and demonstrating strong generalization and robustness. This work establishes a new paradigm for co-optimizing reconfigurable electromagnetic surfaces and wireless resource allocation.
📝 Abstract
Cell-free (CF) massive multiple-input multiple-output (mMIMO) systems offer high spectral efficiency (SE) through multiple distributed access points (APs). However, the large number of antennas increases power consumption. We propose incorporating stacked intelligent metasurfaces (SIM) into CF mMIMO systems as a cost-effective, energy-efficient solution. This paper focuses on optimizing the joint power allocation of APs and the phase shift of SIMs to maximize the sum SE. To address this complex problem, we introduce a fully distributed multi-agent reinforcement learning (MARL) algorithm. Our novel algorithm, the noisy value method with a recurrent policy in multi-agent policy optimization (NVR-MAPPO), enhances performance by encouraging diverse exploration under centralized training and decentralized execution. Simulations demonstrate that NVR-MAPPO significantly improves sum SE and robustness across various scenarios.