🤖 AI Summary
This work addresses the challenge of semantic ambiguity in recognizing similar or complex daily activities using single-modality sensing. To overcome this limitation, the authors introduce a novel multimodal wearable dataset that, for the first time, synchronously integrates wrist-worn inertial measurement unit (IMU) data, environmental sensors (temperature, humidity, and barometric pressure), and audio signals. The dataset encompasses 15 categories of everyday activities, collected over more than 80 hours in the homes of 20 participants, with approximately three hours per participant meticulously annotated. Its scale is roughly six times larger than that of the largest existing comparable dataset. The study also releases open-source data loading and training code. Benchmark experiments demonstrate the complementary benefits of multimodal fusion and the model’s cross-user generalization capability, underscoring the novelty and practical utility of this contribution.
📝 Abstract
With each sensing modality exhibiting inherent strengths and limitations, multi-modal approaches for wearable Human Activity Recognition (HAR) are becoming increasingly relevant -- particularly for recognizing Activities of Daily Living (ADLs), where individual modalities often produce ambiguous signals for similar or complex activities. This work introduces HARMES, a multi-modal wearable dataset combining three wrist-recorded modalities: motion sensing via an Inertial Measurement Unit (IMU), atmospheric environmental sensors (humidity, temperature, and pressure), and audio. Collected from 20 participants performing household activities in their own homes, HARMES totals over 80 hours of recorded data, with approximately three hours of labeled activity data per participant across 15 ADL classes. To the best of our knowledge, HARMES is the first dataset to combine this particular sensor trio, and it is nearly six times larger than the previously largest wrist-inertial-acoustic HAR dataset. In an extensive benchmark, we evaluate cross-subject generalization and conduct an ablation study revealing that modality contributions are activity-dependent and can provide complementary value, particularly for activities that are ambiguous from motion data alone. HARMES is freely available at Zenodo, alongside example code for loading the dataset and training models on GitHub.