Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

To address the challenges of poor exploration and low sample efficiency in reinforcement learning (RL) for long-horizon robotic manipulation tasks—caused by sparse rewards and high-dimensional visual observations—this paper proposes a demonstration-augmented visual RL framework grounded in multi-stage task structure. Our method decomposes global goals into sequential sub-goals and automatically generates dense rewards per stage, enabling staged dense-reward learning. It further introduces a two-phase training paradigm that jointly optimizes demonstration-guided reward shaping, policy distillation, and a latent-space world model. Finally, it achieves efficient end-to-end mapping from raw visual observations to robot actions. Evaluated on 16 sparse-reward tasks—including humanoid visual control—our approach improves data efficiency by 40% on average and up to 70% on challenging tasks, achieving significant convergence with only five expert demonstrations.

Technology Category

Application Category

📝 Abstract

Long-horizon tasks in robotic manipulation present significant challenges in reinforcement learning (RL) due to the difficulty of designing dense reward functions and effectively exploring the expansive state-action space. However, despite a lack of dense rewards, these tasks often have a multi-stage structure, which can be leveraged to decompose the overall objective into manageable subgoals. In this work, we propose DEMO3, a framework that exploits this structure for efficient learning from visual inputs. Specifically, our approach incorporates multi-stage dense reward learning, a bi-phasic training scheme, and world model learning into a carefully designed demonstration-augmented RL framework that strongly mitigates the challenge of exploration in long-horizon tasks. Our evaluations demonstrate that our method improves data-efficiency by an average of 40% and by 70% on particularly difficult tasks compared to state-of-the-art approaches. We validate this across 16 sparse-reward tasks spanning four domains, including challenging humanoid visual control tasks using as few as five demonstrations.

Problem

Research questions and friction points this paper is trying to address.

Addresses long-horizon robotic manipulation challenges in RL

Proposes DEMO3 for efficient learning from visual inputs

Improves data-efficiency in sparse-reward tasks significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage dense reward learning

Bi-phasic training scheme

Demonstration-augmented RL framework

🔎 Similar Papers

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

2024-05-22arXiv.orgCitations: 1