🤖 AI Summary
Existing self-supervised learning methods naively adapt image/text paradigms to wireless channel representation, overlooking intrinsic channel properties—including spatiotemporal-frequency correlation, hardware constraints, and noise sensitivity. To address this, we propose WiMAE, the first foundation model for wireless channels: a Transformer-based masked autoencoder. We further extend it into ContraWiMAE, a novel multi-task framework that unifies masked reconstruction with noise-augmented contrastive learning—marking the first such integration. Our method introduces a channel-structure-aware masking strategy and pretraining paradigm tailored to multi-antenna systems, enhancing representation discriminability and cross-scenario generalization. Evaluated on diverse downstream tasks, WiMAE achieves a 12.3% improvement in linear separability and reduces few-shot adaptation error by 19.7%, significantly boosting data efficiency and environmental robustness.
📝 Abstract
Current applications of self-supervised learning to wireless channel representation often borrow paradigms developed for text and image processing, without fully addressing the unique characteristics and constraints of wireless communications. Aiming to fill this gap, we first propose WiMAE (Wireless Masked Autoencoder), a transformer-based encoder-decoder foundation model pretrained on a realistic open-source multi-antenna wireless channel dataset. Building upon this foundation, we develop ContraWiMAE, which enhances WiMAE by incorporating a contrastive learning objective alongside the reconstruction task in a unified multi-task framework. By warm-starting from pretrained WiMAE weights and generating positive pairs via noise injection, the contrastive component enables the model to capture both structural and discriminative features, enhancing representation quality beyond what reconstruction alone can achieve. Through extensive evaluation on unseen scenarios, we demonstrate the effectiveness of both approaches across multiple downstream tasks, with ContraWiMAE showing further improvements in linear separability and adaptability in diverse wireless environments. Comparative evaluations against a state-of-the-art wireless channel foundation model confirm the superior performance and data efficiency of our models, highlighting their potential as powerful baselines for future research in self-supervised wireless channel representation learning.