ARMimic: Learning Robotic Manipulation from Passive Human Demonstrations in Augmented Reality

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Traditional robot skill imitation learning faces challenges including bulky hardware, complex calibration, and reliance on specialized XR setups with stringent recording requirements for passive observation. To address these, this paper proposes ARMimic: a robot-free, low-cost passive demonstration acquisition framework leveraging consumer-grade XR headsets and a fixed external camera. Its core innovation lies in integrating first-person hand tracking, real-time AR-based robot overlay, depth-aware perception, and unified embodied trajectory modeling—enabling cross-modal policy generalization from human manipulation to virtual robot motion without additional sensors or intricate calibration. In real-world experiments, ARMimic improves demonstration acquisition efficiency by 50% and increases success rate on the bowl-stacking task by 11% over the ACT baseline. The framework significantly enhances scalability, safety, and practical deployability of imitation learning data collection.

Technology Category

Application Category

📝 Abstract

Imitation learning is a powerful paradigm for robot skill acquisition, yet conventional demonstration methods--such as kinesthetic teaching and teleoperation--are cumbersome, hardware-heavy, and disruptive to workflows. Recently, passive observation using extended reality (XR) headsets has shown promise for egocentric demonstration collection, yet current approaches require additional hardware, complex calibration, or constrained recording conditions that limit scalability and usability. We present ARMimic, a novel framework that overcomes these limitations with a lightweight and hardware-minimal setup for scalable, robot-free data collection using only a consumer XR headset and a stationary workplace camera. ARMimic integrates egocentric hand tracking, augmented reality (AR) robot overlays, and real-time depth sensing to ensure collision-aware, kinematically feasible demonstrations. A unified imitation learning pipeline is at the core of our method, treating both human and virtual robot trajectories as interchangeable, which enables policies that generalize across different embodiments and environments. We validate ARMimic on two manipulation tasks, including challenging long-horizon bowl stacking. In our experiments, ARMimic reduces demonstration time by 50% compared to teleoperation and improves task success by 11% over ACT, a state-of-the-art baseline trained on teleoperated data. Our results demonstrate that ARMimic enables safe, seamless, and in-the-wild data collection, offering great potential for scalable robot learning in diverse real-world settings.

Problem

Research questions and friction points this paper is trying to address.

Overcoming cumbersome hardware requirements in robot imitation learning

Enabling scalable robot-free data collection using consumer XR headsets

Creating collision-aware demonstrations that generalize across environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses consumer XR headset and stationary camera

Integrates hand tracking, AR overlays and depth sensing

Treats human and virtual robot trajectories interchangeably

🔎 Similar Papers

Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

2024-09-18arXiv.orgCitations: 2