Over-the-Air Semantic Alignment with Stacked Intelligent Metasurfaces

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In semantic communication, heterogeneous encoders and decoders cause misalignment in latent-space representations; existing alignment methods rely on additional on-device digital processing, increasing system complexity. This paper proposes an over-the-air semantic alignment framework based on stacked intelligent metasurfaces (SIMs), the first to perform semantic mapping alignment directly in the electromagnetic wave domain—eliminating the need for terminal-side digital computation. We model the SIM as a trainable linear operator and optimize its transmission function via gradient-based learning to adapt to target semantic tasks, supporting both supervised and zero-shot alignment modes. Experiments using Vision Transformers (ViTs) as encoders achieve 90% task accuracy under high SNR and demonstrate strong robustness even under low-SNR conditions, effectively realizing semantic equilibrium.

Technology Category

Application Category

📝 Abstract
Semantic communication systems aim to transmit task-relevant information between devices capable of artificial intelligence, but their performance can degrade when heterogeneous transmitter-receiver models produce misaligned latent representations. Existing semantic alignment methods typically rely on additional digital processing at the transmitter or receiver, increasing overall device complexity. In this work, we introduce the first over-the-air semantic alignment framework based on stacked intelligent metasurfaces (SIM), which enables latent-space alignment directly in the wave domain, reducing substantially the computational burden at the device level. We model SIMs as trainable linear operators capable of emulating both supervised linear aligners and zero-shot Parseval-frame-based equalizers. To realize these operators physically, we develop a gradient-based optimization procedure that tailors the metasurface transfer function to a desired semantic mapping. Experiments with heterogeneous vision transformer (ViT) encoders show that SIMs can accurately reproduce both supervised and zero-shot semantic equalizers, achieving up to 90% task accuracy in regimes with high signal-to-noise ratio (SNR), while maintaining strong robustness even at low SNR values.
Problem

Research questions and friction points this paper is trying to address.

Aligns latent representations in semantic communication systems
Reduces device complexity by over-the-air alignment via metasurfaces
Enables both supervised and zero-shot semantic equalization in wave domain
Innovation

Methods, ideas, or system contributions that make the work stand out.

Over-the-air semantic alignment using stacked intelligent metasurfaces
SIMs emulate trainable linear operators for latent-space alignment
Gradient-based optimization tailors metasurface transfer functions
🔎 Similar Papers
No similar papers found.