๐ค AI Summary
To address the lack of a general-purpose channel representation model in MIMO wireless communications, this paper introduces CSI-CLIPโthe first self-supervised foundation model for wireless channels. It innovatively treats time-domain channel impulse responses (CIR) and frequency-domain channel state information (CSI) as naturally aligned multimodal signals, establishing a CIRโCSI cross-modal consistency pretraining paradigm. Leveraging contrastive learning and joint representation alignment, the model enables large-scale, label-free self-supervised pretraining on channel data. Downstream evaluations demonstrate significant improvements: a 22% reduction in localization mean error and a 1% gain in beam management accuracy, alongside strong cross-scenario generalization. This work pioneers the joint modeling of channel sensing and communication, providing a transferable foundational representation framework for wireless intelligence.
๐ Abstract
In the field of artificial intelligence, self-supervised learning has demonstrated superior generalization capabilities by leveraging large-scale unlabeled datasets for pretraining, which is especially critical for wireless communication models to adapt to a variety of scenarios. This paper innovatively treats Channel State Information (CSI) and Channel Impulse Response (CIR) as naturally aligned multi-modal data and proposes the first MIMO wireless channel foundation model, named CSI-CLIP. By effectively capturing the joint representations of both CIR and CSI, CSI-CLIP exhibits remarkable adaptability across scenarios and robust feature extraction capabilities. Experimental results show that in positioning task, CSI-CLIP reduces the mean error distance by 22%; in beam management task, it increases accuracy by 1% compared to traditional supervised methods, as well as in the channel identification task. These improvements not only highlight the potential and value of CSI-CLIP in integrating sensing and communication but also demonstrate its significant advantages over existing techniques. Moreover, viewing CSI and CIR as multi-modal pairs and contrastive learning for wireless channel foundation model open up new research directions in the domain of MIMO wireless communications.