AffordDP: Generalizable Diffusion Policy with Transferable Affordance

📅 2024-12-04

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Diffusion-based policies exhibit poor generalization in robotic manipulation, particularly for cross-category and unseen object instances. To address this, we propose a diffusion policy framework integrating transferable affordance priors. Specifically, we model 3D contact points and post-contact trajectories as category-agnostic, transferable affordances—first introduced in this work. We estimate 6D object poses via joint inference using foundational vision models (CLIP/SAM) and point-cloud registration, enabling affordance transfer across domains. During diffusion sampling, we incorporate affordance-guided conditioning to ensure action feasibility and goal consistency. Experiments demonstrate that our method significantly outperforms existing diffusion policies in both simulation and real-robot settings. It achieves robust success on zero-shot cross-category and unseen-instance manipulation tasks, whereas baseline methods consistently fail.

Technology Category

Application Category

📝 Abstract

Diffusion-based policies have shown impressive performance in robotic manipulation tasks while struggling with out-of-domain distributions. Recent efforts attempted to enhance generalization by improving the visual feature encoding for diffusion policy. However, their generalization is typically limited to the same category with similar appearances. Our key insight is that leveraging affordances--manipulation priors that define"where"and"how"an agent interacts with an object--can substantially enhance generalization to entirely unseen object instances and categories. We introduce the Diffusion Policy with transferable Affordance (AffordDP), designed for generalizable manipulation across novel categories. AffordDP models affordances through 3D contact points and post-contact trajectories, capturing the essential static and dynamic information for complex tasks. The transferable affordance from in-domain data to unseen objects is achieved by estimating a 6D transformation matrix using foundational vision models and point cloud registration techniques. More importantly, we incorporate affordance guidance during diffusion sampling that can refine action sequence generation. This guidance directs the generated action to gradually move towards the desired manipulation for unseen objects while keeping the generated action within the manifold of action space. Experimental results from both simulated and real-world environments demonstrate that AffordDP consistently outperforms previous diffusion-based methods, successfully generalizing to unseen instances and categories where others fail.

Problem

Research questions and friction points this paper is trying to address.

Enhances generalization in robotic manipulation tasks

Leverages affordances for unseen object instances and categories

Improves action sequence generation with affordance guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages affordances for enhanced generalization.

Uses 3D contact points and trajectories.

Incorporates affordance guidance in diffusion sampling.

🔎 Similar Papers

Make Me Happier: Evoking Emotions Through Image Diffusion Models

2024-03-13arXiv.orgCitations: 3