🤖 AI Summary
Teaching long-horizon, multi-skill manipulation tasks suffers from error accumulation, distribution shift, and instructor fatigue, leading to teaching failure. Method: We propose the $(ST)^2$ stepwise teaching framework, the first systematic comparison of stepwise versus end-to-end imitation learning. It introduces a user-controllable keypoint segmentation mechanism to enable structured, incremental control over the teaching process, and integrates trajectory segmentation with incremental learning to compose multi-stage skills within a demonstration-based learning paradigm. Results: Evaluated on a real-world retail restocking task with 16 participants, $(ST)^2$ achieves success rates comparable to end-to-end teaching while significantly improving user controllability and preference. The framework establishes a scalable, robust human-robot co-teaching paradigm for complex manipulation tasks.
📝 Abstract
Learning from demonstration is effective for teaching robots complex skills with high sample efficiency. However, teaching long-horizon tasks with multiple skills is difficult, as deviations accumulate, distributional shift increases, and human teachers become fatigued, raising the chance of failure. In this work, we study user responses to two teaching frameworks: (i) a traditional monolithic approach, where users demonstrate the entire trajectory of a long-horizon task; and (ii) a sequential approach, where the task is segmented by the user and demonstrations are provided step by step. To support this study, we introduce $(ST)^2$, a sequential method for learning long-horizon manipulation tasks that allows users to control the teaching flow by defining key points, enabling incremental and structured demonstrations. We conducted a user study on a restocking task with 16 participants in a realistic retail environment to evaluate both user preference and method effectiveness. Our objective and subjective results show that both methods achieve similar trajectory quality and success rates. Some participants preferred the sequential approach for its iterative control, while others favored the monolithic approach for its simplicity.