Beyond Motion Primitives: Behavioral Activity Recognition from Head-Mounted IMU

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing head-worn inertial measurement units (IMUs) struggle to capture the high-level behavioral context required for augmented reality (AR) smart glasses. To address this gap, this work presents the first systematic definition of five behavior categories and eight contextual scenarios suitable for head-mounted IMUs, along with the Ego4D-IMU dataset comprising 160,000 samples. The authors propose HiT-HAR, a lightweight hierarchical temporal model with only 703K parameters, supported by a four-tier data quality assurance framework and separability analysis to delineate the observability boundaries of behavioral classes. Experimental results demonstrate that the proposed approach significantly outperforms existing head-worn IMU methods in both behavior and scene recognition, clearly distinguishing between behaviors that are reliably observable, those dependent on temporal context, and those challenged by signal overlap.
📝 Abstract
AR smart glasses need continuous behavioral context to offer proactive assistance, yet their most practical always-on sensor, the head-mounted Inertial Measurement Unit (IMU), detects only motion primitives such as walking or standing. We push beyond motion primitives to behavioral-level recognition, defining five categories that balance AR application need with sensor observability. To this end, we construct a 160K-sample Ego4D dataset with a four-tier quality assurance framework spanning 8 activity scenarios, and propose HiT-HAR, a 703K-parameter hierarchical model that outperforms prior head-mounted IMU models on five-class action and eight-class scenario recognition. We further map the observability frontier of head-mounted IMU through per-class separability analysis, identifying which behavioral categories are reliably observable (Locomotion), which benefit from temporal context (Object Transfer, Task Operation), and where scenario-dependent signal overlap poses remaining challenges. Our results indicate that architectural choices exploiting temporal context and scenario structure outperform simply scaling model size. The code and dataset are publicly available at https://github.com/Harvard-AI-and-Robotics-Lab/HiT-HAR.
Problem

Research questions and friction points this paper is trying to address.

behavioral activity recognition
head-mounted IMU
motion primitives
AR smart glasses
sensor observability
Innovation

Methods, ideas, or system contributions that make the work stand out.

behavioral activity recognition
head-mounted IMU
hierarchical modeling
temporal context
observability analysis
C
Chung-Ta Huang
Harvard AI and Robotics Lab, Harvard University
L
Léopold Das
Harvard AI and Robotics Lab, Harvard University
J
Jeffrey Zhou
Harvard AI and Robotics Lab, Harvard University
F
Faizaan Siddique
Harvard AI and Robotics Lab, Harvard University
J
Julia Seungjoo Baek
Harvard AI and Robotics Lab, Harvard University
S
Serena Liu
Harvard AI and Robotics Lab, Harvard University
A
Andrew Rusli
Harvard AI and Robotics Lab, Harvard University
T
Todd Y. Zhou
Harvard AI and Robotics Lab, Harvard University
F
Freddy Yu
Harvard AI and Robotics Lab, Harvard University
S
Sinclair Hansen
Harvard AI and Robotics Lab, Harvard University
Z
Ziling Hu
Harvard AI and Robotics Lab, Harvard University
A
Arnav Sharma
Harvard AI and Robotics Lab, Harvard University
Mengyu Wang
Mengyu Wang
Assistant Professor, Harvard Medical School
Artificial IntelligenceMachine LearningOphthalmologyGlaucomaComputational Mechanics