HiSync: Spatio-Temporally Aligning Hand Motion from Wearable IMU and On-Robot Camera for Command Source Identification in Long-Range HRI

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the challenge of identifying command sources in long-range multi-user human-robot interaction, where sensor ambiguity often degrades performance. To resolve this, the authors propose a novel approach that fuses optical flow from the robot’s camera with hand motion signals captured by wearable inertial measurement units (IMUs). By leveraging frequency-domain feature extraction, spatiotemporal alignment, and a distance-aware multi-window cross-modal fusion mechanism, the method uniquely exploits hand motion as a user-binding cue. A CSINet-based denoising network and cross-modal similarity computation further enhance robustness under long-range conditions. Evaluated in a real-world setting with three users within 34 meters, the system achieves a command-source identification accuracy of 92.32%, outperforming the state-of-the-art by 48.44%, and demonstrates practical efficacy on a physical robotic platform.

Technology Category

Application Category

📝 Abstract

Long-range Human-Robot Interaction (HRI) remains underexplored. Within it, Command Source Identification (CSI) - determining who issued a command - is especially challenging due to multi-user and distance-induced sensor ambiguity. We introduce HiSync, an optical-inertial fusion framework that treats hand motion as binding cues by aligning robot-mounted camera optical flow with hand-worn IMU signals. We first elicit a user-defined (N=12) gesture set and collect a multimodal command gesture dataset (N=38) in long-range multi-user HRI scenarios. Next, HiSync extracts frequency-domain hand motion features from both camera and IMU data, and a learned CSINet denoises IMU readings, temporally aligns modalities, and performs distance-aware multi-window fusion to compute cross-modal similarity of subtle, natural gestures, enabling robust CSI. In three-person scenes up to 34m, HiSync achieves 92.32% CSI accuracy, outperforming the prior SOTA by 48.44%. HiSync is also validated on real-robot deployment. By making CSI reliable and natural, HiSync provides a practical primitive and design guidance for public-space HRI.

Problem

Research questions and friction points this paper is trying to address.

Command Source Identification

Long-range Human-Robot Interaction

Multi-user Ambiguity

Sensor Fusion

Gesture Recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

optical-inertial fusion

command source identification

hand motion alignment