Symskill: Symbol and Skill Co-Invention for Data-Efficient and Real-Time Long-Horizon Manipulation

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Dynamic environments pose dual challenges for long-horizon, multi-step manipulation: weak combinatorial generalization and difficulty in real-time failure recovery. Imitation learning enables rapid reaction but lacks structured reasoning, while task-and-motion planning (TAMP) offers composability at the cost of high latency. This paper introduces the first unified framework that jointly learns symbolic predicates, operators, and reusable skills—enabling end-to-end, data-efficient, real-time re-planning and failure recovery across both motion and symbolic layers. Our method operates on unlabeled, unsegmented demonstrations and integrates offline learning, symbolic planning, skill composition, and compliant control to dynamically generate and online refine action sequences during execution. In RoboCasa simulation, it achieves 85% success on 12 single-step tasks and composes up to six skills for multi-step tasks without additional training data. On a real Franka robot, it generalizes across multiple tasks using only five minutes of play data.

Technology Category

Application Category

📝 Abstract

Multi-step manipulation in dynamic environments remains challenging. Two major families of methods fail in distinct ways: (i) imitation learning (IL) is reactive but lacks compositional generalization, as monolithic policies do not decide which skill to reuse when scenes change; (ii) classical task-and-motion planning (TAMP) offers compositionality but has prohibitive planning latency, preventing real-time failure recovery. We introduce SymSkill, a unified learning framework that combines the benefits of IL and TAMP, allowing compositional generalization and failure recovery in real-time. Offline, SymSkill jointly learns predicates, operators, and skills directly from unlabeled and unsegmented demonstrations. At execution time, upon specifying a conjunction of one or more learned predicates, SymSkill uses a symbolic planner to compose and reorder learned skills to achieve the symbolic goals, while performing recovery at both the motion and symbolic levels in real time. Coupled with a compliant controller, SymSkill enables safe and uninterrupted execution under human and environmental disturbances. In RoboCasa simulation, SymSkill can execute 12 single-step tasks with 85% success rate. Without additional data, it composes these skills into multi-step plans requiring up to 6 skill recompositions, recovering robustly from execution failures. On a real Franka robot, we demonstrate SymSkill, learning from 5 minutes of unsegmented and unlabeled play data, is capable of performing multiple tasks simply by goal specifications. The source code and additional analysis can be found on https://sites.google.com/view/symskill.

Problem

Research questions and friction points this paper is trying to address.

Enables compositional generalization and real-time failure recovery in manipulation

Learns predicates, operators, and skills from unlabeled demonstrations offline

Uses symbolic planning to compose skills for multi-step tasks execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns predicates, operators, and skills from unlabeled demonstrations

Uses symbolic planner to compose and reorder learned skills

Performs real-time recovery at both motion and symbolic levels

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation