HumanEgo: Zero-Shot Robot Learning from Minutes of Human Egocentric Videos

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of zero-shot transfer from human first-person videos to robotic manipulation skills, where visual and kinematic embodiment gaps hinder direct imitation. The authors propose a novel method that requires no robot-specific data by elevating human demonstrations to an entity-level hand–object interaction representation. Leveraging flow matching and dense auxiliary supervision signals, the approach achieves an average success rate of 92.5% across four real-world robotic tasks using only about 30 minutes of human video per task—substantially outperforming teleoperation of equivalent duration by 41%. Notably, this is the first method to demonstrate zero-shot generalization across diverse robots, camera setups, and environments without reliance on specialized hardware or any robot training data.
📝 Abstract
Human egocentric video captures rich manipulation demonstrations without any robot hardware, yet transferring these skills to robots remains challenging due to the embodiment gap between human and robot in both visual appearance and kinematics. We present HumanEgo, a framework that bridges the embodiment gap by lifting each human demonstration to an entity-level representation of hand-object interaction, and training a flow matching policy with dense auxiliary objectives that amplify supervision from every trajectory. HumanEgo is robot-data-free, hardware-agnostic, data-efficient, and zero-shot human-to-robot transferable. With only 30 minutes of human videos per task, HumanEgo achieves 92.5% average success across four real-world tasks (75% with just 15 minutes), outperforms matched-time robot teleoperation by 41%, and robustly transfers zero-shot across novel robots, cameras, and environments.
Problem

Research questions and friction points this paper is trying to address.

embodiment gap
human-to-robot transfer
egocentric video
robot learning
zero-shot transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot transfer
egocentric video
embodiment gap
flow matching
robot learning