Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
This work addresses the limited cross-task generalization capability of existing methods in open-world robotic manipulation, which often rely solely on low-level action sequences and thus struggle to extract composable skill knowledge. To overcome this, the authors propose a skill-reasoning framework that decomposes observed task demonstrations into interpretable atomic skill–action pairs, constructing a hybrid skill demonstration library comprising both dynamic and static components. By integrating vision–language retrieval, coverage-aware static memory, and in-context learning mechanisms, the framework enables compositional skill reasoning and execution sequencing. Evaluated on the AGNOSTOS benchmark and in real-world environments, the approach demonstrates significant improvements in zero-shot cross-task manipulation performance.
📝 Abstract
Cross-task generalization is a core challenge in open-world robotic manipulation, and the key lies in extracting transferable manipulation knowledge from seen tasks. Recent in-context learning approaches leverage seen task demonstrations to generate actions for unseen tasks without parameter updates. However, existing methods provide only low-level continuous action sequences as context, failing to capture composable skill knowledge and causing models to degenerate into superficial trajectory imitation. We propose Decompose and Recompose, a skill reasoning framework using atomic skill-action pairs as intermediate representations. Our approach decomposes seen demonstrations into interpretable skill--action alignments, enabling the model to recompose these skills for unseen tasks through compositional reasoning. Specifically, we construct a task-adaptive dynamic demonstration library via visual-semantic retrieval combined with skill sequences from a planning agent, complemented by a coverage-aware static library to fill missing skill patterns. Together, these yield skill-comprehensive demonstrations that explicitly elicit compositional reasoning for skill composition and execution ordering. Experiments on the AGNOSTOS benchmark and real-world environments validate our method's zero-shot cross-task generalization capability.
Problem

Research questions and friction points this paper is trying to address.

cross-task generalization
robotic manipulation
composable skills
in-context learning
skill transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional reasoning
skill decomposition
cross-task generalization
in-context learning
zero-shot manipulation
💼 Related Jobs
X
Xitie Zhang
School of Artificial Intelligence, College of Intelligence and Computing, Tianjin University, China
Aming Wu
Aming Wu
Ph.D.
Deep learningData mining
Yahong Han
Yahong Han
Professor of Computer Science, Tianjin University
Multimedia