Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations

📅 2024-12-02
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sparse-reward continuous control tasks suffer from inefficient exploration over long horizons and brittle, labor-intensive hand-crafted reward functions with poor generalization. Method: We propose a systematic reward shaping framework that integrates task-agnostic prior data with a small set of expert demonstrations to enable cross-task knowledge distillation—yielding dynamics-aware dense rewards for the target task. Our approach jointly leverages behavior cloning, dynamics-consistency regularization, inverse reinforcement learning, and density ratio estimation to automatically model state-action value relationships without manual reward engineering. Contribution/Results: The framework eliminates the need for reward design, accelerates online RL training by 3.2× on average, significantly improves stability in achieving long-horizon goals, and demonstrates strong zero-shot generalization to unseen environments.

Technology Category

Application Category

📝 Abstract
Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.
Problem

Research questions and friction points this paper is trying to address.

Sparse-reward RL tasks hinder action sequence discovery
Manual reward shaping is tedious and environment-specific
Dense dynamics-aware rewards integrate prior data and demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates prior experience with expert demonstrations
Synthesizes dense dynamics-aware rewards
Accelerates learning for sparse-reward tasks
🔎 Similar Papers
No similar papers found.
C
Cevahir Köprülü
University of Texas at Austin
P
Po-han Li
University of Texas at Austin
T
Tianyu Qiu
University of Texas at Austin
Ruihan Zhao
Ruihan Zhao
PhD Student, ECE, UT Austin
RoboticsAIComputer Vision
T
T. Westenbroek
University of Washington
David Fridovich-Keil
David Fridovich-Keil
Assistant Professor, The University of Texas at Austin
optimal controldynamic gamesmotion planningrobotic safety
S
Sandeep P. Chinchali
University of Texas at Austin
U
U. Topcu
University of Texas at Austin