UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

📅 2024-09-30
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current robotic systems face limitations in tool use and articulated object manipulation due to inadequate modeling of 3D motion constraints and functional affordances. To address this, we propose the first unified embodied perception paradigm that jointly models object-centric manipulation and task-level semantic understanding. Our contributions are threefold: (1) a novel unified representation framework for affordances of both tools and articulated objects; (2) the first large-scale, multi-category, fine-grained affordance dataset—comprising 900 articulated objects and 600 tools—with comprehensive operational attribute annotations; and (3) a multimodal large language model–based visual-language reasoning pipeline that accurately extracts manipulation representations for affordance recognition and 3D constraint modeling. Evaluated in simulation and on real robotic platforms, our approach achieves an average 27.5% improvement in cross-category generalization, establishing a new state-of-the-art baseline for robotic manipulation.

Technology Category

Application Category

📝 Abstract
Previous studies on robotic manipulation are based on a limited understanding of the underlying 3D motion constraints and affordances. To address these challenges, we propose a comprehensive paradigm, termed UniAff, that integrates 3D object-centric manipulation and task understanding in a unified formulation. Specifically, we constructed a dataset labeled with manipulation-related key attributes, comprising 900 articulated objects from 19 categories and 600 tools from 12 categories. Furthermore, we leverage MLLMs to infer object-centric representations for manipulation tasks, including affordance recognition and reasoning about 3D motion constraints. Comprehensive experiments in both simulation and real-world settings indicate that UniAff significantly improves the generalization of robotic manipulation for tools and articulated objects. We hope that UniAff will serve as a general baseline for unified robotic manipulation tasks in the future. Images, videos, dataset, and code are published on the project website at:https://sites.google.com/view/uni-aff/home
Problem

Research questions and friction points this paper is trying to address.

Unified representation of affordances
3D motion constraints understanding
Robotic manipulation generalization improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified 3D object-centric manipulation
Leverages MLLMs for affordance recognition
Dataset with 1500 labeled objects
🔎 Similar Papers
No similar papers found.
Qiaojun Yu
Qiaojun Yu
Shanghai Jiao Tong University, Shanghai AI Lab
robotic learning3D visionvla
S
Siyuan Huang
Shanghai Jiao Tong University, China and Shanghai AI Lab, China
X
Xibin Yuan
Shanghai Jiao Tong University, China
Zhengkai Jiang
Zhengkai Jiang
Tencent Hunyuan
RLHFDiffusion Models
Ce Hao
Ce Hao
National University of Singapore
X
Xin Li
Shanghai Jiao Tong University, China
Haonan Chang
Haonan Chang
Rutgers University, Robotics Ph.D.
LLMVLM3D understandingManipulation
J
Junbo Wang
Shanghai Jiao Tong University, China
L
Liu Liu
Hefei University of Technology, China
H
Hongsheng Li
CUHK-MMLab, China
P
Peng Gao
Shanghai AI Lab, China
C
Cewu Lu
Shanghai Jiao Tong University, China