Stacked Intelligent Metasurfaces for Multi-Modal Semantic Communications

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high bandwidth overhead and low reconstruction fidelity in multimodal semantic transmission under complex scenarios, this paper proposes a multimodal semantic communication system empowered by stacked intelligent metasurfaces (SIMs). The system introduces a novel SIM-enabled wave-domain semantic architecture: SIMs jointly enable wave-domain visual imaging and amplitude-phase joint modulation for text semantic transmission; a meta-atom gradient optimization method is proposed to enhance semantic imaging accuracy; and visual-text semantic fusion drives a conditional generative adversarial network (cGAN) for collaborative multimodal reconstruction. Experimental results on a uniform planar array demonstrate high-precision semantic imaging, achieving an 8.2 dB improvement in peak signal-to-noise ratio (PSNR) for multimodal reconstruction and a semantic fidelity of 92.7%, while significantly reducing bandwidth requirements.

Technology Category

Application Category

📝 Abstract
Semantic communication (SemCom) powered by generative artificial intelligence enables highly efficient and reliable information transmission. However, it still necessitates the transmission of substantial amounts of data when dealing with complex scene information. In contrast, the stacked intelligent metasurface (SIM), leveraging wave-domain computing, provides a cost-effective solution for directly imaging complex scenes. Building on this concept, we propose an innovative SIM-aided multi-modal SemCom system. Specifically, an SIM is positioned in front of the transmit antenna for transmitting visual semantic information of complex scenes via imaging on the uniform planar array at the receiver. Furthermore, the simple scene description that contains textual semantic information is transmitted via amplitude-phase modulation over electromagnetic waves. To simultaneously transmit multi-modal information, we optimize the amplitude and phase of meta-atoms in the SIM using a customized gradient descent algorithm. The optimization aims to gradually minimize the mean squared error between the normalized energy distribution on the receiver array and the desired pattern corresponding to the visual semantic information. By combining the textual and visual semantic information, a conditional generative adversarial network is used to recover the complex scene accurately. Extensive numerical results verify the effectiveness of the proposed multi-modal SemCom system in reducing bandwidth overhead as well as the capability of the SIM for imaging the complex scene.
Problem

Research questions and friction points this paper is trying to address.

Enabling efficient multi-modal semantic communication for complex scenes
Reducing bandwidth overhead in semantic communication systems
Achieving accurate complex scene imaging via stacked metasurfaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stacked metasurfaces enable direct wave-domain scene imaging
Gradient descent optimizes metasurface for multi-modal transmission
Generative network fuses text and visual semantic data
G
Guojun Huang
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China
Jiancheng An
Jiancheng An
Nanyang Technological University
Stacked Intelligent MetasurfaceFlexible Intelligent MetasurfaceSIMFIM
L
Lu Gan
School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, 611731, China; Yibin Institute of UESTC, Yibin 644000, China
D
Dusit Niyato
College of Computing and Data Science at Nanyang Technological University (NTU), Singapore
M
M'erouane Debbah
KU 6G Research Center, Department of Computer and Information Engineering, Khalifa University, Abu Dhabi 127788, UAE; CentraleSupelec, University Paris-Saclay, 91192 Gif-sur-Yvette, France
Tie Jun Cui
Tie Jun Cui
Southeast University, China
MetamaterialsComputational Electromagnetics