Beyond Motion Primitives: Behavioral Activity Recognition from Head-Mounted IMU

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing head-worn inertial measurement units (IMUs) struggle to capture the high-level behavioral context required for augmented reality (AR) smart glasses. To address this gap, this work presents the first systematic definition of five behavior categories and eight contextual scenarios suitable for head-mounted IMUs, along with the Ego4D-IMU dataset comprising 160,000 samples. The authors propose HiT-HAR, a lightweight hierarchical temporal model with only 703K parameters, supported by a four-tier data quality assurance framework and separability analysis to delineate the observability boundaries of behavioral classes. Experimental results demonstrate that the proposed approach significantly outperforms existing head-worn IMU methods in both behavior and scene recognition, clearly distinguishing between behaviors that are reliably observable, those dependent on temporal context, and those challenged by signal overlap.

📝 Abstract

AR smart glasses need continuous behavioral context to offer proactive assistance, yet their most practical always-on sensor, the head-mounted Inertial Measurement Unit (IMU), detects only motion primitives such as walking or standing. We push beyond motion primitives to behavioral-level recognition, defining five categories that balance AR application need with sensor observability. To this end, we construct a 160K-sample Ego4D dataset with a four-tier quality assurance framework spanning 8 activity scenarios, and propose HiT-HAR, a 703K-parameter hierarchical model that outperforms prior head-mounted IMU models on five-class action and eight-class scenario recognition. We further map the observability frontier of head-mounted IMU through per-class separability analysis, identifying which behavioral categories are reliably observable (Locomotion), which benefit from temporal context (Object Transfer, Task Operation), and where scenario-dependent signal overlap poses remaining challenges. Our results indicate that architectural choices exploiting temporal context and scenario structure outperform simply scaling model size. The code and dataset are publicly available at https://github.com/Harvard-AI-and-Robotics-Lab/HiT-HAR.

Problem

Research questions and friction points this paper is trying to address.

behavioral activity recognition

head-mounted IMU

motion primitives

AR smart glasses

sensor observability

Innovation

Methods, ideas, or system contributions that make the work stand out.

behavioral activity recognition

head-mounted IMU

hierarchical modeling

temporal context

observability analysis

🔎 Similar Papers

Information Fusion in Multimodal IoT Systems for physical activity level monitoring

2024-03-17Citations: 0

Enhancing Screen Time Identification in Children with a Multi-View Vision Language Model and Screen Time Tracker

2024-10-02arXiv.orgCitations: 0