Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations

📅 2024-12-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Sparse-reward continuous control tasks suffer from inefficient exploration over long horizons and brittle, labor-intensive hand-crafted reward functions with poor generalization. Method: We propose a systematic reward shaping framework that integrates task-agnostic prior data with a small set of expert demonstrations to enable cross-task knowledge distillation—yielding dynamics-aware dense rewards for the target task. Our approach jointly leverages behavior cloning, dynamics-consistency regularization, inverse reinforcement learning, and density ratio estimation to automatically model state-action value relationships without manual reward engineering. Contribution/Results: The framework eliminates the need for reward design, accelerates online RL training by 3.2× on average, significantly improves stability in achieving long-horizon goals, and demonstrates strong zero-shot generalization to unseen environments.

Technology Category

Application Category

📝 Abstract

Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.

Problem

Research questions and friction points this paper is trying to address.

Sparse-reward RL tasks hinder action sequence discovery

Manual reward shaping is tedious and environment-specific

Dense dynamics-aware rewards integrate prior data and demonstrations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates prior experience with expert demonstrations

Synthesizes dense dynamics-aware rewards

Accelerates learning for sparse-reward tasks

🔎 Similar Papers

No similar papers found.