🤖 AI Summary
This work addresses the challenges of optimizing high-dimensional, strongly coupled electromagnetic parameters in intelligent metasurfaces (SIMs) and the slow convergence and performance degradation of existing deep reinforcement learning methods under dynamic wireless environments and imperfect eavesdropping channel state information. To overcome these limitations, the paper proposes a hybrid Quantum Proximal Policy Optimization (QPPO) framework that integrates parameterized quantum circuits into the policy network, forming a classical-quantum hybrid architecture. This approach jointly optimizes transmit power allocation and SIM phase shifts to maximize the average secrecy rate while satisfying power and quality-of-service constraints. Experimental results demonstrate that, under incomplete eavesdropper channel state information, the proposed method achieves approximately 15% higher secrecy rate and 30% faster convergence compared to deep reinforcement learning baselines.
📝 Abstract
Stacked intelligent metasurfaces (SIMs) have recently emerged as a powerful wave-domain technology that enables multi-stage manipulation of electromagnetic signals through multilayer programmable architectures. While SIMs offer unprecedented degrees of freedom for enhancing physical-layer security, their extremely large number of meta-atoms leads to a high-dimensional and strongly coupled optimization space, making conventional design approaches inefficient and difficult to scale. Moreover, existing deep reinforcement learning (DRL) techniques suffer from slow convergence and performance degradation in dynamic wireless environments with imperfect knowledge of passive eavesdroppers. To overcome these challenges, we propose a hybrid quantum proximal policy optimization (Q-PPO) framework for SIM-assisted secure communications, which jointly optimizes transmit power allocation and SIM phase shifts to maximize the average secrecy rate under power and quality-of-service constraints. Specifically, a parameterized quantum circuit is embedded into the actor network, forming a hybrid classical-quantum policy architecture that enhances policy representation capability and exploration efficiency in high-dimensional continuous action spaces. Extensive simulations demonstrate that the proposed Q-PPO scheme consistently outperforms DRL baselines, achieving approximately 15% higher secrecy rates and 30% faster convergence under imperfect eavesdropper channel state information. These results establish Q-PPO as a powerful optimization paradigm for SIM-enabled secure wireless networks.