🤖 AI Summary
This work addresses the problem of generalizing robot mobile manipulation tasks from a single, unlabeled human demonstration video. We propose an end-to-end framework comprising four modules: (1) first-person video capture via an AR headset; (2) visual understanding and 3D hand pose estimation to extract human hand trajectories; (3) a hand-to-robot-end-effector mapping mechanism; and (4) trajectory optimization in configuration space for cross-task transferability. Our key contribution is enabling one-shot imitation learning across diverse object layouts—requiring only a single unannotated human demonstration, without additional teleoperation, environment resetting, or task-specific fine-tuning—while achieving full-stack vision-to-motion planning transfer. Experiments demonstrate strong robustness and environmental adaptability across multiple mobile manipulation tasks, significantly enhancing the practicality and deployment efficiency of few-shot imitation learning in real-world robotic settings.
📝 Abstract
We introduce a novel system for human-to-robot trajectory transfer that enables robots to manipulate objects by learning from human demonstration videos. The system consists of four modules. The first module is a data collection module that is designed to collect human demonstration videos from the point of view of a robot using an AR headset. The second module is a video understanding module that detects objects and extracts 3D human-hand trajectories from demonstration videos. The third module transfers a human-hand trajectory into a reference trajectory of a robot end-effector in 3D space. The last module utilizes a trajectory optimization algorithm to solve a trajectory in the robot configuration space that can follow the end-effector trajectory transferred from the human demonstration. Consequently, these modules enable a robot to watch a human demonstration video once and then repeat the same mobile manipulation task in different environments, even when objects are placed differently from the demonstrations. Experiments of different manipulation tasks are conducted on a mobile manipulator to verify the effectiveness of our system