An Atomic Skill Library Construction Method for Data-Efficient Embodied Manipulation

📅 2025-01-25

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

End-to-end learning for embodied manipulation suffers from a “data explosion” issue, requiring prohibitively large amounts of task-specific demonstration data. Method: This paper proposes a dynamic atomic-skill-oriented construction paradigm. It employs vision-language planning (VLP) to decompose high-level tasks and abstract subtasks into reusable atomic skills; leverages vision-language-action (VLA) model fine-tuning for data-efficient skill accumulation; and introduces a novel three-stage incremental update mechanism that enables continuous skill refinement and composition. Contribution/Results: This work marks the first shift in embodied manipulation learning—from monolithic end-to-end task policies to composable, transferable atomic skills. Experiments in real-world settings demonstrate substantial reductions in data requirements for new tasks while maintaining high manipulation accuracy and strong cross-task generalization. The approach further enables rapid adaptation to unseen environments and novel tasks, significantly enhancing scalability and practical deployability.

Technology Category

Application Category

📝 Abstract

Embodied manipulation is a fundamental ability in the realm of embodied artificial intelligence. Although current embodied manipulation models show certain generalizations in specific settings, they struggle in new environments and tasks due to the complexity and diversity of real-world scenarios. The traditional end-to-end data collection and training manner leads to significant data demands, which we call ``data explosion''. To address the issue, we introduce a three-wheeled data-driven method to build an atomic skill library. We divide tasks into subtasks using the Vision-Language Planning (VLP). Then, atomic skill definitions are formed by abstracting the subtasks. Finally, an atomic skill library is constructed via data collection and Vision-Language-Action (VLA) fine-tuning. As the atomic skill library expands dynamically with the three-wheel update strategy, the range of tasks it can cover grows naturally. In this way, our method shifts focus from end-to-end tasks to atomic skills, significantly reducing data costs while maintaining high performance and enabling efficient adaptation to new tasks. Extensive experiments in real-world settings demonstrate the effectiveness and efficiency of our approach.

Problem

Research questions and friction points this paper is trying to address.

Skill Library

Minimal Data

Data Explosion

Innovation

Methods, ideas, or system contributions that make the work stand out.

data-driven method

atomic skill library

Vision-Language-Action (VLA) fine-tuning

🔎 Similar Papers

Long-horizon Embodied Planning with Implicit Logical Inference and Hallucination Mitigation

2024-09-24Citations: 1

Toyota Research Institute

Los Altos, CA / Cambridge, MA

AI Research Scientist, Robotics