๐ค AI Summary
Existing WiFi-based human activity recognition methods predominantly rely on single-modal signals, limiting their capacity to model complex channel dynamics and exhibiting poor cross-domain generalization. To address this, we propose a novel dual-modal collaborative sensing framework thatโ for the first timeโjointly leverages WiFi phase and Doppler shift signals to construct a human-activity-driven dynamic channel representation. Methodologically, we design a dual-branch self-attention architecture to capture intra-modal temporal dependencies and introduce a grouped gating attention mechanism to enable robust cross-modal feature fusion and information entropy optimization. Evaluated on Widar3.0 and XRF55 datasets, our approach achieves a 4.2% improvement in intra-domain accuracy over state-of-the-art methods and a 7.8% gain in cross-domain accuracy, significantly enhancing both precision and generalizability of contactless activity recognition.
๐ Abstract
WiFi-based human behavior recognition aims to recognize gestures and activities by analyzing wireless signal variations. However, existing methods typically focus on a single type of data, neglecting the interaction and fusion of multiple features. To this end, we propose a novel multimodal collaborative awareness method. By leveraging phase data reflecting changes in dynamic path length and Doppler Shift (DFS) data corresponding to frequency changes related to the speed of gesture movement, we enable efficient interaction and fusion of these features to improve recognition accuracy. Specifically, we first introduce a dual-branch self-attention module to capture spatial-temporal cues within each modality. Then, a group attention mechanism is applied to the concatenated phase and DFS features to mine key group features critical for behavior recognition. Through a gating mechanism, the combined features are further divided into PD-strengthen and PD-weaken branches, optimizing information entropy and promoting cross-modal collaborative awareness. Extensive in-domain and cross-domain experiments on two large publicly available datasets, Widar3.0 and XRF55, demonstrate the superior performance of our method.