CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
To address challenges in wearable-sensor-based human activity recognition—including multimodal data entanglement, high activity heterogeneity, and difficulties in edge deployment—this paper proposes a cross-modal spatiotemporal disentangled representation framework coupled with gradient modulation. The method employs a modality decomposition–alignment–fusion strategy, integrating spatiotemporal attention with cross-modal disentangled representation learning to achieve feature disentanglement, enhanced generalization, and computational efficiency. A gradient modulation mechanism is further introduced to optimize multi-task joint training. Additionally, a lightweight edge-deployment simulation system is developed. Extensive experiments on multiple mainstream public datasets demonstrate that the proposed approach improves recognition accuracy by an average of +2.3%, reduces model parameters by 37%, and decreases inference latency by 41%, thereby validating its effectiveness, robustness, and practicality in real-world edge scenarios.

Technology Category

Application Category

📝 Abstract
Human Activity Recognition (HAR) is a fundamental technology for numerous human - centered intelligent applications. Although deep learning methods have been utilized to accelerate feature extraction, issues such as multimodal data mixing, activity heterogeneity, and complex model deployment remain largely unresolved. The aim of this paper is to address issues such as multimodal data mixing, activity heterogeneity, and complex model deployment in sensor-based human activity recognition. We propose a spatiotemporal attention modal decomposition alignment fusion strategy to tackle the problem of the mixed distribution of sensor data. Key discriminative features of activities are captured through cross-modal spatio-temporal disentangled representation, and gradient modulation is combined to alleviate data heterogeneity. In addition, a wearable deployment simulation system is constructed. We conducted experiments on a large number of public datasets, demonstrating the effectiveness of the model.
Problem

Research questions and friction points this paper is trying to address.

Address multimodal data mixing in wearable activity recognition
Resolve activity heterogeneity via disentangled representation
Simplify complex model deployment for wearable systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatiotemporal attention modal decomposition alignment fusion
Cross-modal spatio-temporal disentangled representation
Wearable deployment simulation system