π€ AI Summary
This work addresses the limitations of sparse or delayed rewards and handcrafted, task-specific reward functions in reinforcement learning, which hinder sample efficiency and generalization. The authors propose a general, task-agnostic implicit reward mechanism that generates dense, semantically aligned reward signals by comparing the language embeddings of natural language task descriptions with those of the agentβs interaction experiences. Leveraging pretrained language models, this approach constructs a semantic-aware measure of task progress, significantly improving training efficiency and cross-task generalization. Experimental results demonstrate that the framework accelerates convergence, increases success rates across diverse complex tasks, and effectively handles scenarios where conventional hand-designed rewards fail. The authors also release an open-source language-based task evaluation benchmark to support further research.
π Abstract
We introduce Reward-Zero, a general-purpose implicit reward mechanism that transforms natural-language task descriptions into dense, semantically grounded progress signals for reinforcement learning (RL). Reward-Zero serves as a simple yet sophisticated universal reward function that leverages language embeddings for efficient RL training. By comparing the embedding of a task specification with embeddings derived from an agent's interaction experience, Reward-Zero produces a continuous, semantically aligned sense-of-completion signal. This reward supplements sparse or delayed environmental feedback without requiring task-specific engineering. When integrated into standard RL frameworks, it accelerates exploration, stabilizes training, and enhances generalization across diverse tasks. Empirically, agents trained with Reward-Zero converge faster and achieve higher final success rates than conventional methods such as PPO with common reward-shaping baselines, successfully solving tasks that hand-designed rewards could not in some complex tasks. In addition, we develop a mini benchmark for the evaluation of completion sense during task execution via language embeddings. These results highlight the promise of language-driven implicit reward functions as a practical path toward more sample-efficient, generalizable, and scalable RL for embodied agents. Code will be released after peer review.