🤖 AI Summary
This work addresses the limitations of traditional imitation learning in handling complex, long-horizon tasks—namely its reliance on large amounts of high-quality demonstration data and poor adaptability. The authors propose a hierarchical imitation learning framework that enables efficient action generation through proportional blending of motion primitives. In this architecture, a high-level policy performs long-term planning, while a low-level policy learns multiple motion primitives and combines them in adjustable proportions to synthesize novel actions. Three variants of the proportion model—learnable, sampling-based, and replay-based—are introduced to balance flexibility, computational cost, and adaptability. Experiments on a real robotic pick-and-place task demonstrate that the approach successfully generates complex actions not present in the primitive set, with the sampling-based and replay-based models significantly outperforming standard hierarchical baselines in both stability and generalization.
📝 Abstract
Imitation learning (IL) enables robots to acquire human-like motion skills from demonstrations, but it still requires extensive high-quality data and retraining to handle complex or long-horizon tasks. To improve data efficiency and adaptability, this study proposes a hierarchical IL framework that integrates motion primitives with proportion-based motion synthesis. The proposed method employs a two-layer architecture, where the upper layer performs long-term planning, while a set of lower-layer models learn individual motion primitives, which are combined according to specific proportions. Three model variants are introduced to explore different trade-offs between learning flexibility, computational cost, and adaptability: a learning-based proportion model, a sampling-based proportion model, and a playback-based proportion model, which differ in how the proportions are determined and whether the upper layer is trainable. Through real-robot pick-and-place experiments, the proposed models successfully generated complex motions not included in the primitive set. The sampling-based and playback-based proportion models achieved more stable and adaptable motion generation than the standard hierarchical model, demonstrating the effectiveness of proportion-based motion integration for practical robot learning.