🤖 AI Summary
To address distribution shift, catastrophic forgetting, and scarce labeled data in cross-user continual activity recognition for wearable multi-sensor systems, this paper proposes CLAD-Net. The framework innovatively integrates a self-supervised Transformer to construct long-term semantic memory, couples it with a knowledge-distillation-driven lightweight CNN classifier, and introduces a cross-sensor attention mechanism to enable label-free universal representation learning and historical knowledge retention. Evaluated on the PAMAP2 dataset, CLAD-Net achieves a final accuracy of 91.36% and a forgetting rate of only 8.78%, significantly outperforming baselines such as experience replay and Elastic Weight Consolidation (EWC). Moreover, it maintains strong robustness even when trained with only 10–20% labeled samples. This work establishes an effective new paradigm for few-shot, cross-user continual learning in resource-constrained wearable sensing scenarios.
📝 Abstract
The rise of deep learning has greatly advanced human behavior monitoring using wearable sensors, particularly human activity recognition (HAR). While deep models have been widely studied, most assume stationary data distributions - an assumption often violated in real-world scenarios. For example, sensor data from one subject may differ significantly from another, leading to distribution shifts. In continual learning, this shift is framed as a sequence of tasks, each corresponding to a new subject. Such settings suffer from catastrophic forgetting, where prior knowledge deteriorates as new tasks are learned. This challenge is compounded by the scarcity and inconsistency of labeled data in human studies. To address these issues, we propose CLAD-Net (Continual Learning with Attention and Distillation), a framework enabling wearable-sensor models to be updated continuously without sacrificing performance on past tasks. CLAD-Net integrates a self-supervised transformer, acting as long-term memory, with a supervised Convolutional Neural Network (CNN) trained via knowledge distillation for activity classification. The transformer captures global activity patterns through cross-attention across body-mounted sensors, learning generalizable representations without labels. Meanwhile, the CNN leverages knowledge distillation to retain prior knowledge during subject-wise fine-tuning. On PAMAP2, CLAD-Net achieves 91.36 percent final accuracy with only 8.78 percent forgetting, surpassing memory-based and regularization-based baselines such as Experience Replay and Elastic Weight Consolidation. In semi-supervised settings with only 10-20 percent labeled data, CLAD-Net still delivers strong performance, demonstrating robustness to label scarcity. Ablation studies further validate each module's contribution.