🤖 AI Summary
This work addresses the challenge of detecting weak, long-range human activity patterns in WiFi channel state information (CSI) time-series data. We propose a dual-pyramid network that jointly models global temporal semantics and local dynamic sensitivity. Our method introduces two key innovations: (i) a Signed Mask-Attention mechanism that selectively enhances activity-relevant temporal dependencies while suppressing noise, and (ii) a ContraNorm feature fusion strategy enabling cross-pyramid cross-attention—first of its kind—to strengthen multi-scale feature interaction. Furthermore, we release CSI-Activity, the first large-scale, annotated CSI time-series activity dataset, comprising 2,114 labeled segments. On our benchmark, the proposed approach achieves an average 8.7% F1-score improvement over state-of-the-art baselines, with notably enhanced robustness under low signal-to-noise ratio and long-duration activity scenarios. This work establishes a new paradigm for contactless, fine-grained human activity sensing.
📝 Abstract
We address the challenge of WiFi-based temporal activity detection and propose an efficient Dual Pyramid Network that integrates Temporal Signal Semantic Encoders and Local Sensitive Response Encoders. The Temporal Signal Semantic Encoder splits feature learning into high and low-frequency components, using a novel Signed Mask-Attention mechanism to emphasize important areas and downplay unimportant ones, with the features fused using ContraNorm. The Local Sensitive Response Encoder captures fluctuations without learning. These feature pyramids are then combined using a new cross-attention fusion mechanism. We also introduce a dataset with over 2,114 activity segments across 553 WiFi CSI samples, each lasting around 85 seconds. Extensive experiments show our method outperforms challenging baselines.