O$^3$Afford: One-Shot 3D Object-to-Object Affordance Grounding for Generalizable Robotic Manipulation

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the weak generalization capability of 3D affordance grounding under limited labeled data, this paper proposes the first single-shot 3D affordance grounding framework tailored for object-to-object scenarios. Methodologically, it introduces the first end-to-end integration of one-stage 3D affordance detection with large language models (LLMs), synergistically fusing semantic features from 2D vision foundation models, geometric representations from point clouds, and the structured reasoning capabilities of LLMs. This enables cross-object and cross-category affordance understanding, along with automatic generation of task-constrained affordance functions. Experiments on multiple 3D affordance benchmarks demonstrate that our approach significantly outperforms existing few-shot baselines, achieving state-of-the-art performance in both accuracy and cross-category generalization.

Technology Category

Application Category

📝 Abstract

Grounding object affordance is fundamental to robotic manipulation as it establishes the critical link between perception and action among interacting objects. However, prior works predominantly focus on predicting single-object affordance, overlooking the fact that most real-world interactions involve relationships between pairs of objects. In this work, we address the challenge of object-to-object affordance grounding under limited data contraints. Inspired by recent advances in few-shot learning with 2D vision foundation models, we propose a novel one-shot 3D object-to-object affordance learning approach for robotic manipulation. Semantic features from vision foundation models combined with point cloud representation for geometric understanding enable our one-shot learning pipeline to generalize effectively to novel objects and categories. We further integrate our 3D affordance representation with large language models (LLMs) for robotics manipulation, significantly enhancing LLMs' capability to comprehend and reason about object interactions when generating task-specific constraint functions. Our experiments on 3D object-to-object affordance grounding and robotic manipulation demonstrate that our O$^3$Afford significantly outperforms existing baselines in terms of both accuracy and generalization capability.

Problem

Research questions and friction points this paper is trying to address.

Grounding object-to-object affordance for robotic manipulation

Addressing limited data constraints in 3D affordance learning

Enhancing generalization to novel objects and categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot 3D object-to-object affordance learning approach

Combines vision foundation models with point cloud representation

Integrates 3D affordance with LLMs for robotic manipulation

🔎 Similar Papers

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models