🤖 AI Summary
This work addresses the limited generalization of existing wireless foundation models to cross-scale, cross-scenario heterogeneous channel state information (CSI), which stems from fixed input dimensions or scale-isolated training. The authors propose a channel-adaptive pretraining framework for heterogeneous CSI and reveal that scale heterogeneity induces destructive gradient interference, whereas scenario diversity promotes gradient alignment. To mitigate this, they design a scale-aware adaptive batching strategy and a dual-masking mechanism to effectively disentangle true signals from padding artifacts. Their model achieves strong zero-shot performance across 12 datasets without fine-tuning, outperforming state-of-the-art zero-shot methods by 7.19 dB, 4.08 dB, and 5.27 dB in NMSE for CSI reconstruction, time-domain prediction, and frequency-domain prediction, respectively. Additionally, it reduces training latency by 53% and improves average generalization performance by 1.53 dB.
📝 Abstract
Wireless foundation models promise transformative capabilities for channel state information (CSI) processing across diverse 6G network applications, yet face fundamental challenges due to the inherent dual heterogeneity of CSI across both scale and scenario dimensions. However, current pretraining approaches either constrain inputs to fixed dimensions or isolate training by scale, limiting the generalization and scalability of wireless foundation models. In this paper, we propose HeterCSI, a channel-adaptive pretraining framework that reconciles training efficiency with robust cross-scenario generalization via a new understanding of gradient dynamics in heterogeneous CSI pretraining. Our key insight reveals that CSI scale heterogeneity primarily causes destructive gradient interference, while scenario diversity actually promotes constructive gradient alignment when properly managed. Specifically, we formulate heterogeneous CSI batch construction as a partitioning optimization problem that minimizes zero-padding overhead while preserving scenario diversity. To solve this, we develop a scale-aware adaptive batching strategy that aligns CSI samples of similar scales, and design a double-masking mechanism to isolate valid signals from padding artifacts. Extensive experiments on 12 datasets demonstrate that HeterCSI establishes a generalized foundation model without scenario-specific finetuning, achieving superior average performance over full-shot baselines. Compared to the state-of-the-art zero-shot benchmark WiFo, it reduces NMSE by 7.19 dB, 4.08 dB, and 5.27 dB for CSI reconstruction, time-domain, and frequency-domain prediction, respectively. The proposed HeterCSI framework also reduces training latency by 53% compared to existing approaches while improving generalization performance by 1.53 dB on average.