Solving Robotics Tasks with Prior Demonstration via Exploration-Efficient Deep Reinforcement Learning

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address low exploration efficiency, large demonstration-guidance errors, and premature convergence to suboptimal policies in robotic task learning, this paper proposes DRLR—a demonstration-guided efficient deep reinforcement learning framework. Methodologically: (1) it improves the IBRL action-selection mechanism by introducing Q-value calibration to mitigate bootstrapping bias; (2) it replaces TD3 with Soft Actor-Critic (SAC) for policy optimization, enhancing robustness and adaptability to heterogeneous demonstration data; (3) it synergistically integrates imitation learning with soft Q-learning to enable effective exploration. Evaluated in simulation, DRLR successfully accomplishes bucket loading and drawer opening tasks, and achieves successful sim-to-real transfer onto a real wheeled loader. Results demonstrate significant improvements in multi-task stability, sample efficiency, and generalization capability compared to baseline methods.

Technology Category

Application Category

📝 Abstract

This paper proposes an exploration-efficient Deep Reinforcement Learning with Reference policy (DRLR) framework for learning robotics tasks that incorporates demonstrations. The DRLR framework is developed based on an algorithm called Imitation Bootstrapped Reinforcement Learning (IBRL). We propose to improve IBRL by modifying the action selection module. The proposed action selection module provides a calibrated Q-value, which mitigates the bootstrapping error that otherwise leads to inefficient exploration. Furthermore, to prevent the RL policy from converging to a sub-optimal policy, SAC is used as the RL policy instead of TD3. The effectiveness of our method in mitigating bootstrapping error and preventing overfitting is empirically validated by learning two robotics tasks: bucket loading and open drawer, which require extensive interactions with the environment. Simulation results also demonstrate the robustness of the DRLR framework across tasks with both low and high state-action dimensions, and varying demonstration qualities. To evaluate the developed framework on a real-world industrial robotics task, the bucket loading task is deployed on a real wheel loader. The sim2real results validate the successful deployment of the DRLR framework.

Problem

Research questions and friction points this paper is trying to address.

Improving exploration efficiency in robotics reinforcement learning

Mitigating bootstrapping errors in imitation-based RL algorithms

Preventing policy convergence to sub-optimal solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exploration-efficient deep reinforcement learning with reference policy

Modified action selection module for calibrated Q-values

SAC policy replacing TD3 to prevent sub-optimal convergence

🔎 Similar Papers

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance