🤖 AI Summary
To address the challenges of dexterous robotic hand manipulation in real-world settings—namely, heavy reliance on extensive expert demonstrations, poor generalization, and limited performance—this paper proposes an Imitation-Guided Online Reinforcement Learning (IBORL) framework. IBORL first pretrains a policy using a small set of expert demonstrations, then performs closed-loop online fine-tuning directly on the physical robot via Proximal Policy Optimization (PPO). Crucially, we introduce a novel distribution-matching regularization term to mitigate catastrophic forgetting caused by distributional shift between expert demonstrations and real-world execution, thereby overcoming key performance bottlenecks in imitation learning. Experimental results on real dexterous hand manipulation tasks demonstrate near-perfect success rates (≈100%), a 23% reduction in cycle time, and substantial performance gains over the original expert demonstrations.
📝 Abstract
Dexterous hand manipulation in real-world scenarios presents considerable challenges due to its demands for both dexterity and precision. While imitation learning approaches have thoroughly examined these challenges, they still require a significant number of expert demonstrations and are limited by a constrained performance upper bound. In this paper, we propose a novel and efficient Imitation-Bootstrapped Online Reinforcement Learning (IBORL) method tailored for robotic dexterous hand manipulation in real-world environments. Specifically, we pretrain the policy using a limited set of expert demonstrations and subsequently finetune this policy through direct reinforcement learning in the real world. To address the catastrophic forgetting issues that arise from the distribution shift between expert demonstrations and real-world environments, we design a regularization term that balances the exploration of novel behaviors with the preservation of the pretrained policy. Our experiments with real-world tasks demonstrate that our method significantly outperforms existing approaches, achieving an almost 100% success rate and a 23% improvement in cycle time. Furthermore, by finetuning with online reinforcement learning, our method surpasses expert demonstrations and uncovers superior policies. Our code and empirical results are available in https://hggforget.github.io/iborl.github.io/.