TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

In robotic reinforcement learning, manually designing dense reward functions is costly and poorly scalable. To address this, we propose TimeRewarder—the first method that automatically generates step-wise dense rewards solely from unlabeled passive videos (including human demonstration videos). Our approach constructs a time-consistent embedding space via self-supervised contrastive learning, jointly modeling forward and backward temporal distances between frame pairs; temporal progression is then quantified as a progress score and converted into a reward signal. Evaluated on 10 Meta-World tasks, TimeRewarder achieves near 100% success rates on 9 out of 10 tasks using only 200K environment interactions—substantially outperforming hand-crafted rewards and prior methods. It sets new state-of-the-art performance in both sample efficiency and final task success rate.

Technology Category

Application Category

📝 Abstract

Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources.

Problem

Research questions and friction points this paper is trying to address.

Automating dense reward design for reinforcement learning

Estimating task progress from passive video demonstrations

Providing step-wise proxy rewards for sparse-reward tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns dense reward from passive videos

Models temporal distances between frame pairs

Uses progress estimation for reinforcement learning

🔎 Similar Papers

Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment