🤖 AI Summary
This work addresses the challenge that existing foundation models for wireless channels rely on high-dimensional, fully observed channel state information (CSI), which incurs prohibitive overhead and latency in practical deployments. To overcome this, the authors propose PilotWiMAE, a self-supervised channel representation learning framework that directly learns channel features from highly sparse pilot signals—compressing the observation space by two orders of magnitude. Leveraging spatio-temporal-frequency factorized attention, patch-normalized reconstruction, large-scale fading–assisted loss, and AWGN curriculum learning, PilotWiMAE enables efficient pre-training under a 99% masking ratio. A decoder-centric post-pretraining stage further decouples representation learning from reconstruction. Evaluated on 28 GHz after pretraining at 3.5 GHz, the model outperforms supervised baselines in cross-band beam selection and channel representation while drastically reducing observation overhead and latency. Code, models, and the CSIGen simulation toolkit are publicly released.
📝 Abstract
Channel foundation models assume access to fully observed channels, an assumption that fails in deployment. We introduce PilotWiMAE, a self-supervised framework whose encoder ingests noisy pilot observations directly and whose attention factorizes along the axis separating temporal from joint space-frequency processing, an inductive bias inspired by the physics of the problem. Pilot input shrinks the observation space by up to two orders of magnitude and also removes the unrealistic assumption of full-CSI availability while incurring lower latency. The factorized design generates robust representations by exploiting the separable channel structure and allows a pretraining mask ratio of $99\%$. We pair patch-normalized reconstruction, which captures small-scale fading structure, with an auxiliary scale loss that recovers the large-scale fading features, and use an AWGN curriculum to match pilot noise at pretraining and deployment. Pretrained solely on $3.5$\,GHz and evaluated at $28$\,GHz across in-distribution and out-of-distribution settings, PilotWiMAE's cross-frequency beam selection and channel characterization beat supervised baselines despite operating on a smaller observation space. To weaken the coupling between decoder capacity and representation quality, we further propose a decoder-centric pretraining stage following the encoder-decoder joint pretraining, which allows PilotWiMAE to demonstrate competitive channel estimation without sacrificing representation quality. To foster further work in this direction, we release the PilotWiMAE pretrained weights and training pipeline, together with CSIGen, our Sionna-based ray-tracing channel-generation tool, and the channel datasets used in this work.