Affordance-Centric Policy Learning: Sample Efficient and Generalisable Robot Policy Learning using Affordance-Centric Task Frames

📅 2024-10-15

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the low sample efficiency and poor generalization of imitation learning in long-horizon, multi-object robotic manipulation tasks, this paper proposes an affordance-centered task coordinate framework. The framework defines and orients a task coordinate system directly on object affordances, jointly achieving intra-class invariance and spatial invariance—thereby significantly improving policy generalization across object instances and poses. Our method integrates a general-purpose large vision model for robust affordance perception and tracking, and couples behavior cloning with coordinate-system alignment for policy learning. With only 10 demonstration trajectories, it matches the generalization performance of an end-to-end baseline trained on 305 image samples, while enabling robust cross-instance manipulation. The code and validation videos are publicly available.

Technology Category

Application Category

📝 Abstract

Affordances are central to robotic manipulation, where most tasks can be simplified to interactions with task-specific regions on objects. By focusing on these key regions, we can abstract away task-irrelevant information, simplifying the learning process, and enhancing generalisation. In this paper, we propose an affordance-centric policy-learning approach that centres and appropriately extit{orients} a extit{task frame} on these affordance regions allowing us to achieve both extbf{intra-category invariance} -- where policies can generalise across different instances within the same object category -- and extbf{spatial invariance} -- which enables consistent performance regardless of object placement in the environment. We propose a method to leverage existing generalist large vision models to extract and track these affordance frames, and demonstrate that our approach can learn manipulation tasks using behaviour cloning from as little as 10 demonstrations, with equivalent generalisation to an image-based policy trained on 305 demonstrations. We provide video demonstrations on our project site: https://affordance-policy.github.io.

Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in imitation learning with few demonstrations

Enhancing generalization for long-horizon multi-object manipulation tasks

Addressing spatial and intra-category generalization in robot policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Oriented affordance frames for structured representation

Self-progress prediction for seamless policy transitions

Compositional generalization from minimal demonstrations

🔎 Similar Papers

Affordance-based Robot Manipulation with Flow Matching