Dexterous Hand Manipulation via Efficient Imitation-Bootstrapped Online Reinforcement Learning

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the challenges of dexterous robotic hand manipulation in real-world settings—namely, heavy reliance on extensive expert demonstrations, poor generalization, and limited performance—this paper proposes an Imitation-Guided Online Reinforcement Learning (IBORL) framework. IBORL first pretrains a policy using a small set of expert demonstrations, then performs closed-loop online fine-tuning directly on the physical robot via Proximal Policy Optimization (PPO). Crucially, we introduce a novel distribution-matching regularization term to mitigate catastrophic forgetting caused by distributional shift between expert demonstrations and real-world execution, thereby overcoming key performance bottlenecks in imitation learning. Experimental results on real dexterous hand manipulation tasks demonstrate near-perfect success rates (≈100%), a 23% reduction in cycle time, and substantial performance gains over the original expert demonstrations.

Technology Category

Application Category

📝 Abstract

Dexterous hand manipulation in real-world scenarios presents considerable challenges due to its demands for both dexterity and precision. While imitation learning approaches have thoroughly examined these challenges, they still require a significant number of expert demonstrations and are limited by a constrained performance upper bound. In this paper, we propose a novel and efficient Imitation-Bootstrapped Online Reinforcement Learning (IBORL) method tailored for robotic dexterous hand manipulation in real-world environments. Specifically, we pretrain the policy using a limited set of expert demonstrations and subsequently finetune this policy through direct reinforcement learning in the real world. To address the catastrophic forgetting issues that arise from the distribution shift between expert demonstrations and real-world environments, we design a regularization term that balances the exploration of novel behaviors with the preservation of the pretrained policy. Our experiments with real-world tasks demonstrate that our method significantly outperforms existing approaches, achieving an almost 100% success rate and a 23% improvement in cycle time. Furthermore, by finetuning with online reinforcement learning, our method surpasses expert demonstrations and uncovers superior policies. Our code and empirical results are available in https://hggforget.github.io/iborl.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Addresses dexterous hand manipulation challenges in real-world scenarios.

Reduces reliance on extensive expert demonstrations for imitation learning.

Overcomes catastrophic forgetting during policy finetuning in real environments.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines imitation learning with online reinforcement learning

Uses regularization to prevent catastrophic forgetting

Achieves high success rates and faster cycle times

🔎 Similar Papers

No similar papers found.