🤖 AI Summary
In multi-user multiple-input single-output (MISO) systems, limited radio-frequency (RF) chains lead to high computational complexity and low energy efficiency in conventional precoding. Method: This paper proposes a synergistic architecture integrating stacked intelligent metasurfaces (SIM) and deep reinforcement learning (DRL). It introduces multi-layer SIMs into MISO systems for the first time, constructing a parameterized wireless environment state space and jointly optimizing SIM phase responses and base station transmit power allocation. Proximal Policy Optimization (PPO) is employed with pre-designed states, data whitening, and hyperparameter tuning to enable lightweight, real-time control under stringent RF-chain and transmit-power constraints. Contribution/Results: Experiments demonstrate a 32% improvement in sum rate over conventional precoding, stable training convergence, and strong generalization capability—validating the effectiveness of SIM-DRL co-optimization in reducing hardware cost while enhancing spectral efficiency.
📝 Abstract
Stacked intelligent metasurfaces (SIMs) represent a novel signal processing paradigm that enables over-the-air processing of electromagnetic waves at the speed of light. Their multi-layer architecture exhibits customizable computational capabilities compared to conventional single-layer reconfigurable intelligent surfaces and metasurface lenses. In this paper, we deploy SIM to improve the performance of multi-user multiple-input single-output (MISO) wireless systems through a low complexity manner with reduced numbers of transmit radio frequency chains. In particular, an optimization formulation for the joint design of the SIM phase shifts and the transmit power allocation is presented, which is efficiently tackled via a customized deep reinforcement learning (DRL) approach that systematically explores pre-designed states of the SIM-parametrized smart wireless environment. The presented performance evaluation results demonstrate the proposed method's capability to effectively learn from the wireless environment, while consistently outperforming conventional precoding schemes under low transmit power conditions. Furthermore, the implementation of hyperparameter tuning and whitening process significantly enhance the robustness of the proposed DRL framework.