🤖 AI Summary
To address challenges in wearable-sensor-based human activity recognition—including multimodal data entanglement, high activity heterogeneity, and difficulties in edge deployment—this paper proposes a cross-modal spatiotemporal disentangled representation framework coupled with gradient modulation. The method employs a modality decomposition–alignment–fusion strategy, integrating spatiotemporal attention with cross-modal disentangled representation learning to achieve feature disentanglement, enhanced generalization, and computational efficiency. A gradient modulation mechanism is further introduced to optimize multi-task joint training. Additionally, a lightweight edge-deployment simulation system is developed. Extensive experiments on multiple mainstream public datasets demonstrate that the proposed approach improves recognition accuracy by an average of +2.3%, reduces model parameters by 37%, and decreases inference latency by 41%, thereby validating its effectiveness, robustness, and practicality in real-world edge scenarios.
📝 Abstract
Human Activity Recognition (HAR) is a fundamental technology for numerous human - centered intelligent applications. Although deep learning methods have been utilized to accelerate feature extraction, issues such as multimodal data mixing, activity heterogeneity, and complex model deployment remain largely unresolved. The aim of this paper is to address issues such as multimodal data mixing, activity heterogeneity, and complex model deployment in sensor-based human activity recognition. We propose a spatiotemporal attention modal decomposition alignment fusion strategy to tackle the problem of the mixed distribution of sensor data. Key discriminative features of activities are captured through cross-modal spatio-temporal disentangled representation, and gradient modulation is combined to alleviate data heterogeneity. In addition, a wearable deployment simulation system is constructed. We conducted experiments on a large number of public datasets, demonstrating the effectiveness of the model.