Instant Policy: In-Context Imitation Learning via Graph Diffusion

📅 2024-11-19

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address the challenge of instantaneous robotic policy learning—i.e., mastering new tasks from only 1–2 demonstrations without fine-tuning, while enabling zero-shot transfer across robot embodiments and natural language instructions—this paper introduces Graph Diffusion-based Imitation Learning (GD-ICIL), a novel paradigm. GD-ICIL models imitation as a graph generation process endowed with inductive biases, jointly reasoning over demonstrations, observations, and actions. It incorporates a pseudo-demonstration auto-generation mechanism to synthesize unlimited simulated trajectories, and integrates graph neural networks with denoising diffusion probabilistic models for context-aware action generation. Extensive evaluation on both simulation and real-world robotic arms demonstrates rapid generalization to everyday manipulation tasks. Notably, GD-ICIL achieves, for the first time, zero-shot cross-embodiment transfer across heterogeneous robot configurations and zero-shot adaptation to tasks specified solely via natural language instructions.

Technology Category

Application Category

📝 Abstract

Following the impressive capabilities of in-context learning with large transformers, In-Context Imitation Learning (ICIL) is a promising opportunity for robotics. We introduce Instant Policy, which learns new tasks instantly (without further training) from just one or two demonstrations, achieving ICIL through two key components. First, we introduce inductive biases through a graph representation and model ICIL as a graph generation problem with a learned diffusion process, enabling structured reasoning over demonstrations, observations, and actions. Second, we show that such a model can be trained using pseudo-demonstrations - arbitrary trajectories generated in simulation - as a virtually infinite pool of training data. Simulated and real experiments show that Instant Policy enables rapid learning of various everyday robot tasks. We also show how it can serve as a foundation for cross-embodiment and zero-shot transfer to language-defined tasks. Code and videos are available at https://www.robot-learning.uk/instant-policy.

Problem

Research questions and friction points this paper is trying to address.

Enables instant robot task learning from few demonstrations

Uses graph diffusion for structured reasoning in imitation learning

Trains with simulated pseudo-demonstrations for infinite data scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph diffusion for structured reasoning

Pseudo-demonstrations for infinite training data

In-context learning from few demonstrations

🔎 Similar Papers

Bellman Diffusion Models