ARMimic: Learning Robotic Manipulation from Passive Human Demonstrations in Augmented Reality

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional robot skill imitation learning faces challenges including bulky hardware, complex calibration, and reliance on specialized XR setups with stringent recording requirements for passive observation. To address these, this paper proposes ARMimic: a robot-free, low-cost passive demonstration acquisition framework leveraging consumer-grade XR headsets and a fixed external camera. Its core innovation lies in integrating first-person hand tracking, real-time AR-based robot overlay, depth-aware perception, and unified embodied trajectory modeling—enabling cross-modal policy generalization from human manipulation to virtual robot motion without additional sensors or intricate calibration. In real-world experiments, ARMimic improves demonstration acquisition efficiency by 50% and increases success rate on the bowl-stacking task by 11% over the ACT baseline. The framework significantly enhances scalability, safety, and practical deployability of imitation learning data collection.

Technology Category

Application Category

📝 Abstract
Imitation learning is a powerful paradigm for robot skill acquisition, yet conventional demonstration methods--such as kinesthetic teaching and teleoperation--are cumbersome, hardware-heavy, and disruptive to workflows. Recently, passive observation using extended reality (XR) headsets has shown promise for egocentric demonstration collection, yet current approaches require additional hardware, complex calibration, or constrained recording conditions that limit scalability and usability. We present ARMimic, a novel framework that overcomes these limitations with a lightweight and hardware-minimal setup for scalable, robot-free data collection using only a consumer XR headset and a stationary workplace camera. ARMimic integrates egocentric hand tracking, augmented reality (AR) robot overlays, and real-time depth sensing to ensure collision-aware, kinematically feasible demonstrations. A unified imitation learning pipeline is at the core of our method, treating both human and virtual robot trajectories as interchangeable, which enables policies that generalize across different embodiments and environments. We validate ARMimic on two manipulation tasks, including challenging long-horizon bowl stacking. In our experiments, ARMimic reduces demonstration time by 50% compared to teleoperation and improves task success by 11% over ACT, a state-of-the-art baseline trained on teleoperated data. Our results demonstrate that ARMimic enables safe, seamless, and in-the-wild data collection, offering great potential for scalable robot learning in diverse real-world settings.
Problem

Research questions and friction points this paper is trying to address.

Overcoming cumbersome hardware requirements in robot imitation learning
Enabling scalable robot-free data collection using consumer XR headsets
Creating collision-aware demonstrations that generalize across environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses consumer XR headset and stationary camera
Integrates hand tracking, AR overlays and depth sensing
Treats human and virtual robot trajectories interchangeably
R
Rohan Walia
Department of Computer Engineering; Learning Systems and Robotics Lab; Munich Institute of Robotics and Machine Intelligence (MIRMI), Technical University of Munich, Germany
Y
Yusheng Wang
Department of Precision Engineering; Mobile Robotics Lab; Research into Artifacts, Center for Engineering (RACE), University of Tokyo, Japan
Ralf Römer
Ralf Römer
Technical University of Munich
Machine LearningRoboticsEmbodied AIVLAControl
M
Masahiro Nishio
Toyota Motor Corporation, Japan
A
Angela P. Schoellig
Department of Computer Engineering; Learning Systems and Robotics Lab; Munich Institute of Robotics and Machine Intelligence (MIRMI), Technical University of Munich, Germany
Jun Ota
Jun Ota
Research into Artifacts, Center for Engineering (RACE), School of Engg., The University of Tokyo
RoboticsProduction Engineering