🤖 AI Summary
This paper addresses three key challenges in long-horizon, dexterous robotic manipulation: poor physical robustness, difficulty in skill composition and sequencing, and high cost of real-world data acquisition. To this end, we propose a multi-skill synthesis and orchestration framework grounded in few-shot human demonstrations. Our core contributions are: (1) automatic decomposition of demonstrations into semantically meaningful skill units using foundation models; (2) a skill-routing Transformer architecture that enables dynamic skill selection and end-to-end long-horizon policy generation; and (3) a unified learning pipeline integrating imitation learning, reinforcement learning, synthetic data augmentation, and sim-to-real transfer—requiring only minimal real-world demonstrations to generate diverse, high-quality training data. Evaluated on three real-world long-horizon manipulation tasks, our method achieves substantial improvements in task success rate (+28.6%) and robustness under environmental disturbances, consistently outperforming state-of-the-art baselines.
📝 Abstract
Developing robotic systems capable of robustly executing long-horizon manipulation tasks with human-level dexterity is challenging, as such tasks require both physical dexterity and seamless sequencing of manipulation skills while robustly handling environment variations. While imitation learning offers a promising approach, acquiring comprehensive datasets is resource-intensive. In this work, we propose a learning framework and system LodeStar that automatically decomposes task demonstrations into semantically meaningful skills using off-the-shelf foundation models, and generates diverse synthetic demonstration datasets from a few human demos through reinforcement learning. These sim-augmented datasets enable robust skill training, with a Skill Routing Transformer (SRT) policy effectively chaining the learned skills together to execute complex long-horizon manipulation tasks. Experimental evaluations on three challenging real-world long-horizon dexterous manipulation tasks demonstrate that our approach significantly improves task performance and robustness compared to previous baselines. Videos are available at lodestar-robot.github.io.