🤖 AI Summary
Existing head-worn inertial measurement units (IMUs) struggle to capture the high-level behavioral context required for augmented reality (AR) smart glasses. To address this gap, this work presents the first systematic definition of five behavior categories and eight contextual scenarios suitable for head-mounted IMUs, along with the Ego4D-IMU dataset comprising 160,000 samples. The authors propose HiT-HAR, a lightweight hierarchical temporal model with only 703K parameters, supported by a four-tier data quality assurance framework and separability analysis to delineate the observability boundaries of behavioral classes. Experimental results demonstrate that the proposed approach significantly outperforms existing head-worn IMU methods in both behavior and scene recognition, clearly distinguishing between behaviors that are reliably observable, those dependent on temporal context, and those challenged by signal overlap.
📝 Abstract
AR smart glasses need continuous behavioral context to offer proactive assistance, yet their most practical always-on sensor, the head-mounted Inertial Measurement Unit (IMU), detects only motion primitives such as walking or standing. We push beyond motion primitives to behavioral-level recognition, defining five categories that balance AR application need with sensor observability. To this end, we construct a 160K-sample Ego4D dataset with a four-tier quality assurance framework spanning 8 activity scenarios, and propose HiT-HAR, a 703K-parameter hierarchical model that outperforms prior head-mounted IMU models on five-class action and eight-class scenario recognition. We further map the observability frontier of head-mounted IMU through per-class separability analysis, identifying which behavioral categories are reliably observable (Locomotion), which benefit from temporal context (Object Transfer, Task Operation), and where scenario-dependent signal overlap poses remaining challenges. Our results indicate that architectural choices exploiting temporal context and scenario structure outperform simply scaling model size. The code and dataset are publicly available at https://github.com/Harvard-AI-and-Robotics-Lab/HiT-HAR.