🤖 AI Summary
This study addresses the limited generalization of human activity recognition models in real-world wearable sensor applications, primarily due to data heterogeneity and scarce annotations. To this end, the authors introduce BenchHAR, a unified evaluation framework that systematically assesses the cross-domain generalization performance of eight self-supervised learning (SSL) methods combined with twelve encoder-classifier architectures on approximately 258,000 samples. The findings reveal that hybrid pretraining strategies paired with CNN-based encoders achieve the best performance, that incorporating unlabeled data from downstream tasks significantly enhances generalization, and that data collected from consumer-grade devices exhibits superior transferability compared to research-grade sensors. This work elucidates key generalization patterns of SSL in sensor-based activity recognition and offers practical guidance for real-world deployment.
📝 Abstract
Human Activity Recognition (HAR) from wearable sensors supports broad healthcare and behavior science applications. However, data heterogeneity and the scarcity of labeled data limit its real-world generalization. Recent advances in self-supervised learning (SSL) in vision and language domains have shown strong capability for learning generalizable representations from unlabeled data. Yet, few studies have systematically compared the generalization performance of SSL methods or explored how to adapt them for generalizable HAR. To address these gaps, we present BenchHAR, a unified framework for evaluating the generalization capability of SSL methods for sensor-based HAR on unseen target distributions. BenchHAR curates a large-scale dataset (~258K samples) and evaluates eight representative SSL methods across 12 encoder-classifier architectures. Our results reveal that existing SSL methods struggle to achieve satisfactory generalization performance. We find that: (1) For HAR models, the hybrid paradigm (combining reconstruction and contrastive pretraining) achieves the best overall performance. The CNN encoder exhibits the strongest ability to learn generalizable representations, while more expressive classifier architectures further improve generalization. (2) For data scale, increasing the amount of pretraining data from downstream activity classes consistently improves generalization, while adding more labeled data yields limited gains. Interestingly, incorporating unlabeled data from non-downstream activity classes does not improve generalization. (3) Sensor data collected from custom-grade devices generalizes better than that from research-grade devices, and data from limb transfers more effectively to trunk positions. BenchHAR provides a unified benchmark and actionable insights for generalizable sensor-based HAR systems. Our code is available at https://github.com/saiketa/HAR-Bench.