🤖 AI Summary
To address the task-specificity and poor generalizability of AI models in communication systems, this paper proposes the first multimodal foundation model tailored for wireless communications. Built upon the Transformer architecture, it processes raw communication signals directly and innovatively tackles four domain-specific challenges: physics-aware adaptive signal tokenization, channel-driven positional encoding, cross-domain (time/frequency/space) feature alignment, and physical-domain normalization. The model jointly estimates multiple critical parameters—including transmission rank, precoding matrix, Doppler spread, and delay spectrum—enabling unified physical-layer inference. Evaluated on real-world channel measurements, it achieves high-accuracy prediction and strong generalization across diverse scenarios. This work establishes a new paradigm for communication AI, shifting from narrow, task-specific models toward general-purpose foundation models capable of broad physical-layer understanding.
📝 Abstract
Artificial Intelligence (AI) has demonstrated unprecedented performance across various domains, and its application to communication systems is an active area of research. While current methods focus on task-specific solutions, the broader trend in AI is shifting toward large general models capable of supporting multiple applications. In this work, we take a step toward a foundation model for communication data--a transformer-based, multi-modal model designed to operate directly on communication data. We propose methodologies to address key challenges, including tokenization, positional embedding, multimodality, variable feature sizes, and normalization. Furthermore, we empirically demonstrate that such a model can successfully estimate multiple features, including transmission rank, selected precoder, Doppler spread, and delay profile.