Dexterous Hand Manipulation via Efficient Imitation-Bootstrapped Online Reinforcement Learning

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of dexterous robotic hand manipulation in real-world settings—namely, heavy reliance on extensive expert demonstrations, poor generalization, and limited performance—this paper proposes an Imitation-Guided Online Reinforcement Learning (IBORL) framework. IBORL first pretrains a policy using a small set of expert demonstrations, then performs closed-loop online fine-tuning directly on the physical robot via Proximal Policy Optimization (PPO). Crucially, we introduce a novel distribution-matching regularization term to mitigate catastrophic forgetting caused by distributional shift between expert demonstrations and real-world execution, thereby overcoming key performance bottlenecks in imitation learning. Experimental results on real dexterous hand manipulation tasks demonstrate near-perfect success rates (≈100%), a 23% reduction in cycle time, and substantial performance gains over the original expert demonstrations.

Technology Category

Application Category

📝 Abstract
Dexterous hand manipulation in real-world scenarios presents considerable challenges due to its demands for both dexterity and precision. While imitation learning approaches have thoroughly examined these challenges, they still require a significant number of expert demonstrations and are limited by a constrained performance upper bound. In this paper, we propose a novel and efficient Imitation-Bootstrapped Online Reinforcement Learning (IBORL) method tailored for robotic dexterous hand manipulation in real-world environments. Specifically, we pretrain the policy using a limited set of expert demonstrations and subsequently finetune this policy through direct reinforcement learning in the real world. To address the catastrophic forgetting issues that arise from the distribution shift between expert demonstrations and real-world environments, we design a regularization term that balances the exploration of novel behaviors with the preservation of the pretrained policy. Our experiments with real-world tasks demonstrate that our method significantly outperforms existing approaches, achieving an almost 100% success rate and a 23% improvement in cycle time. Furthermore, by finetuning with online reinforcement learning, our method surpasses expert demonstrations and uncovers superior policies. Our code and empirical results are available in https://hggforget.github.io/iborl.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Addresses dexterous hand manipulation challenges in real-world scenarios.
Reduces reliance on extensive expert demonstrations for imitation learning.
Overcomes catastrophic forgetting during policy finetuning in real environments.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines imitation learning with online reinforcement learning
Uses regularization to prevent catastrophic forgetting
Achieves high success rates and faster cycle times
🔎 Similar Papers
No similar papers found.
Dongchi Huang
Dongchi Huang
Beihang University
Reinforcement LearningEmbodied AIWorld Models
T
Tianle Zhang
JD Explore Academy, Beijing, China
Y
Yihang Li
JD Explore Academy, Beijing, China
L
Ling Zhao
JD Explore Academy, Beijing, China
J
Jiayi Li
Beijing Jiaotong University, Beijing, China
Zhirui Fang
Zhirui Fang
Master of Artificial Intelligence Tsinghua University
Embodied AIReinforcement Learning
Chunhe Xia
Chunhe Xia
Beihang University, Beijing, China
L
Lusong Li
JD Explore Academy, Beijing, China
X
Xiaodong He
JD Explore Academy, Beijing, China