Learning Compositional Behaviors from Demonstration and Language

📅 2025-05-28

🏛️ Conference on Robot Learning

📈 Citations: 6

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses long-horizon robotic manipulation tasks involving articulated objects, partial observability, and geometric constraints, focusing on learning composable, highly generalizable high-level behavioral representations from language-annotated demonstrations. We propose a unified framework integrating large language model–based semantic understanding, vision-language grounding, imitation learning, model-driven planning, and neural policy control. Our approach is the first to automatically extract a structured action library—including visually grounded preconditions and effects—directly from multimodal demonstrations, without requiring manually defined symbolic states or prior annotations. The learned representations enable generalization across varying initial states, goals, and environmental perturbations. We validate the framework’s effectiveness on diverse, complex object manipulation tasks in both simulation and real-robot settings.

Technology Category

Application Category

📝 Abstract

We introduce Behavior from Language and Demonstration (BLADE), a framework for long-horizon robotic manipulation by integrating imitation learning and model-based planning. BLADE leverages language-annotated demonstrations, extracts abstract action knowledge from large language models (LLMs), and constructs a library of structured, high-level action representations. These representations include preconditions and effects grounded in visual perception for each high-level action, along with corresponding controllers implemented as neural network-based policies. BLADE can recover such structured representations automatically, without manually labeled states or symbolic definitions. BLADE shows significant capabilities in generalizing to novel situations, including novel initial states, external state perturbations, and novel goals. We validate the effectiveness of our approach both in simulation and on real robots with a diverse set of objects with articulated parts, partial observability, and geometric constraints.

Problem

Research questions and friction points this paper is trying to address.

Integrates imitation learning and model-based planning for robotic manipulation

Extracts action knowledge from language models for high-level representations

Generalizes to novel states, perturbations, and goals without manual labels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates imitation learning and model-based planning

Leverages language-annotated demonstrations and LLMs

Automatically constructs structured action representations

🔎 Similar Papers

Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation

2023-12-17Citations: 10

Toyota Research Institute

Los Altos, CA / Cambridge, MA

Research Scientist Intern, Robotic Control Policy (PhD)