🤖 AI Summary
In multi-objective robotic manipulation, imitation learning suffers from limited human demonstration coverage, leading to bias toward simple subtasks and poor generalization to complex tasks. To address this, we propose a goal-conditioned, adaptively weighted Generative Adversarial Imitation Learning (GAIL) framework. Our method deeply integrates Hindsight Experience Replay (HER) with GAIL, introducing a dynamic importance reweighting mechanism and policy gradient optimization to robustly leverage suboptimal demonstrations. This constitutes the first incorporation of an adaptive learning mechanism into GAIL, effectively mitigating demonstration bias. Evaluated on both simulated and real-world human-robot collaborative multi-objective manipulation tasks, our approach improves training efficiency by over 40% and successfully accomplishes challenging in-hand manipulation tasks. Moreover, it demonstrates significantly enhanced generalization capability across diverse task configurations.
📝 Abstract
Reinforcement learning for multi-goal robot manipulation tasks poses significant challenges due to the diversity and complexity of the goal space. Techniques such as Hindsight Experience Replay (HER) have been introduced to improve learning efficiency for such tasks. More recently, researchers have combined HER with advanced imitation learning methods such as Generative Adversarial Imitation Learning (GAIL) to integrate demonstration data and accelerate training speed. However, demonstration data often fails to provide enough coverage for the goal space, especially when acquired from human teleoperation. This biases the learning-from-demonstration process toward mastering easier sub-tasks instead of tackling the more challenging ones. In this work, we present Goal-based Self-Adaptive Generative Adversarial Imitation Learning (Goal-SAGAIL), a novel framework specifically designed for multi-goal robot manipulation tasks. By integrating self-adaptive learning principles with goal-conditioned GAIL, our approach enhances imitation learning efficiency, even when limited, suboptimal demonstrations are available. Experimental results validate that our method significantly improves learning efficiency across various multi-goal manipulation scenarios -- including complex in-hand manipulation tasks -- using suboptimal demonstrations provided by both simulation and human experts.