ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of enabling robots to learn novel manipulation tasks from language instructions alone, without handcrafted reward functions or task-specific demonstrations. Methodologically, we propose a language-conditioned reward modeling framework that jointly learns a mapping from language instructions to reward signals and an offline-pretrained policy from a small set of initial demonstrations; offline reinforcement learning is performed via Conservative Q-Learning (CQL), and reward generalization is achieved through multimodal representation alignment. The approach supports zero-shot task transfer and fine-tuning with minimal online interaction. Our key contribution is the first formulation of natural language instructions as directly generalizable, cross-task reward signals—eliminating reliance on task-specific demonstrations or manual reward engineering. Experiments demonstrate a 2.4× improvement in reward generalization, 2× higher sample efficiency in simulation over baselines, and a 5× speedup in real-world dual-arm policy adaptation.

Technology Category

Application Category

📝 Abstract
We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-efficient, language-conditioned reward function that labels the dataset with rewards, and (2) a language-conditioned policy pre-trained with offline RL using these rewards. Given an unseen task variation, ReWiND fine-tunes the pre-trained policy using the learned reward function, requiring minimal online interaction. We show that ReWiND's reward model generalizes effectively to unseen tasks, outperforming baselines by up to 2.4x in reward generalization and policy alignment metrics. Finally, we demonstrate that ReWiND enables sample-efficient adaptation to new tasks, beating baselines by 2x in simulation and improving real-world pretrained bimanual policies by 5x, taking a step towards scalable, real-world robot learning. See website at https://rewind-reward.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Learning robot tasks without new demonstrations
Generalizing rewards to unseen task variations
Improving sample efficiency for policy adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-conditioned reward function learning
Offline RL pre-trained language-conditioned policy
Minimal online interaction for task adaptation
🔎 Similar Papers
No similar papers found.
J
Jiahui Zhang
Thomas Lord Department of Computer Science, University of Southern California
Yusen Luo
Yusen Luo
University of Southern California
Robot learning
Abrar Anwar
Abrar Anwar
University of Southern California
roboticshuman-robot interactionnatural language processing
S
S. Sontakke
Amazon Robotics
J
Joseph J Lim
Kim Jaechul School of Artificial Intelligence, KAIST
Jesse Thomason
Jesse Thomason
Assistant Professor, University of Southern California
Natural Language ProcessingArtificial IntelligenceRobotics
E
Erdem Biyik
Thomas Lord Department of Computer Science, University of Southern California
J
Jesse Zhang
Thomas Lord Department of Computer Science, University of Southern California