TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In robotic reinforcement learning, manually designing dense reward functions is costly and poorly scalable. To address this, we propose TimeRewarder—the first method that automatically generates step-wise dense rewards solely from unlabeled passive videos (including human demonstration videos). Our approach constructs a time-consistent embedding space via self-supervised contrastive learning, jointly modeling forward and backward temporal distances between frame pairs; temporal progression is then quantified as a progress score and converted into a reward signal. Evaluated on 10 Meta-World tasks, TimeRewarder achieves near 100% success rates on 9 out of 10 tasks using only 200K environment interactions—substantially outperforming hand-crafted rewards and prior methods. It sets new state-of-the-art performance in both sample efficiency and final task success rate.

Technology Category

Application Category

📝 Abstract
Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources.
Problem

Research questions and friction points this paper is trying to address.

Automating dense reward design for reinforcement learning
Estimating task progress from passive video demonstrations
Providing step-wise proxy rewards for sparse-reward tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns dense reward from passive videos
Models temporal distances between frame pairs
Uses progress estimation for reinforcement learning
Y
Yuyang Liu
Institute for Interdisciplinary Information Sciences, Tsinghua University
Chuan Wen
Chuan Wen
Shanghai Jiao Tong University
RoboticsMachine LearningComputer Vision
Yihang Hu
Yihang Hu
IIIS, Tsinghua University
AI
Dinesh Jayaraman
Dinesh Jayaraman
Assistant Professor, University of Pennsylvania
robot learningcomputer visionroboticsmachine learning
Y
Yang Gao
Institute for Interdisciplinary Information Sciences, Tsinghua University