🤖 AI Summary
In robotic reinforcement learning, manually designing dense reward functions is costly and poorly scalable. To address this, we propose TimeRewarder—the first method that automatically generates step-wise dense rewards solely from unlabeled passive videos (including human demonstration videos). Our approach constructs a time-consistent embedding space via self-supervised contrastive learning, jointly modeling forward and backward temporal distances between frame pairs; temporal progression is then quantified as a progress score and converted into a reward signal. Evaluated on 10 Meta-World tasks, TimeRewarder achieves near 100% success rates on 9 out of 10 tasks using only 200K environment interactions—substantially outperforming hand-crafted rewards and prior methods. It sets new state-of-the-art performance in both sample efficiency and final task success rate.
📝 Abstract
Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources.