Human Preference Modeling Using Visual Motion Prediction Improves Robot Skill Learning from Egocentric Human Video

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work addresses the challenge of learning robotic skills from first-person human videos, where existing methods often rely on strong assumptions about the long-term value of visual states and struggle to generalize across embodiments and environments. The authors propose a vision-based motion prediction approach to model human preferences: by tracking point correspondences across consecutive frames and measuring the consistency between predicted and actual object motion, they construct per-step rewards without requiring knowledge of demonstration endpoints. Integrating this reward signal with an improved Soft Actor-Critic algorithm and initializing the policy using only ten real demonstrations effectively bridges the domain gap between human videos and robot execution. Experiments demonstrate that the learned policies achieve or surpass state-of-the-art performance across multiple tasks in both simulation and real-world robotic settings, confirming the method’s efficacy and generalization capability.

Technology Category

Application Category

📝 Abstract

We present an approach to robot learning from egocentric human videos by modeling human preferences in a reward function and optimizing robot behavior to maximize this reward. Prior work on reward learning from human videos attempts to measure the long-term value of a visual state as the temporal distance between it and the terminal state in a demonstration video. These approaches make assumptions that limit performance when learning from video. They must also transfer the learned value function across the embodiment and environment gap. Our method models human preferences by learning to predict the motion of tracked points between subsequent images and defines a reward function as the agreement between predicted and observed object motion in a robot's behavior at each step. We then use a modified Soft Actor Critic (SAC) algorithm initialized with 10 on-robot demonstrations to estimate a value function from this reward and optimize a policy that maximizes this value function, all on the robot. Our approach is capable of learning on a real robot, and we show that policies learned with our reward model match or outperform prior work across multiple tasks in both simulation and on the real robot.

Problem

Research questions and friction points this paper is trying to address.

robot learning

human video

reward modeling

embodiment gap

preference modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

human preference modeling

visual motion prediction

reward learning from video