Scale What Counts, Mask What Matters: Evaluating Foundation Models for Zero-Shot Cross-Domain Wi-Fi Sensing

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Wi-Fi sensing suffers from poor cross-environment, cross-device, and cross-user generalization due to domain shift, exacerbated by the scarcity and fragmentation of existing small-scale datasets. To address this, we propose the first large-scale Masked Autoencoder (MAE) pretraining foundation model for Wi-Fi Channel State Information (CSI), trained on 1.3 million heterogeneous samples spanning diverse devices, frequency bands, and bandwidths. We systematically demonstrate that data scale—not model size—dominates cross-domain generalization, with performance improving logarithmically-linearly with dataset size while exhibiting diminishing returns from model scaling. Evaluated on human activity recognition, gesture recognition, and user identification, our method achieves zero-shot cross-domain accuracy gains of 2.2–15.7% over supervised baselines—marking the first empirical validation that large-scale self-supervised pretraining can substantially overcome the generalization bottleneck in Wi-Fi sensing.

Technology Category

Application Category

📝 Abstract
While Wi-Fi sensing offers a compelling, privacy-preserving alternative to cameras, its practical utility has been fundamentally undermined by a lack of robustness across domains. Models trained in one setup fail to generalize to new environments, hardware, or users, a critical "domain shift" problem exacerbated by modest, fragmented public datasets. We shift from this limited paradigm and apply a foundation model approach, leveraging Masked Autoencoding (MAE) style pretraining on the largest and most heterogeneous Wi-Fi CSI datasets collection assembled to date. Our study pretrains and evaluates models on over 1.3 million samples extracted from 14 datasets, collected using 4 distinct devices across the 2.4/5/6 GHz bands and bandwidths from 20 to 160 MHz. Our large-scale evaluation is the first to systematically disentangle the impacts of data diversity versus model capacity on cross-domain performance. The results establish scaling trends on Wi-Fi CSI sensing. First, our experiments show log-linear improvements in unseen domain performance as the amount of pretraining data increases, suggesting that data scale and diversity are key to domain generalization. Second, based on the current data volume, larger model can only provide marginal gains for cross-domain performance, indicating that data, rather than model capacity, is the current bottleneck for Wi-Fi sensing generalization. Finally, we conduct a series of cross-domain evaluations on human activity recognition, human gesture recognition and user identification tasks. The results show that the large-scale pretraining improves cross-domain accuracy ranging from 2.2% to 15.7%, compared to the supervised learning baseline. Overall, our findings provide insightful direction for designing future Wi-Fi sensing systems that can eventually be robust enough for real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

Addressing domain shift issues in Wi-Fi sensing across environments and hardware
Evaluating foundation models for zero-shot cross-domain Wi-Fi sensing performance
Investigating data scale versus model capacity for Wi-Fi CSI generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretraining with Masked Autoencoding on heterogeneous datasets
Evaluating cross-domain performance with 1.3 million samples
Leveraging data scale over model capacity for generalization
🔎 Similar Papers
No similar papers found.