AffordDexGrasp: Open-set Language-guided Dexterous Grasp with Generalizable-Instructive Affordance

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the open-set, language-guided dexterous grasping problem—specifically, zero-shot semantic understanding and action generation for unseen object categories. We propose Generalizable-Instructive Affordance (GIA), a functional representation grounded in local geometry and category-agnostic attributes, enabling robust generalization across novel objects and tasks. Methodologically, we introduce a novel dual-stream generative framework: Affordance Flow Matching (AFM) and Grasp Flow Matching (GFM), integrating geometry-aware representation learning, conditional flow matching, and multimodal language–action alignment. Our approach is trained jointly on open-set synthetic and real-world data. Evaluated on a newly constructed open-set benchmark and deployed on a physical dexterous robot platform, our method achieves state-of-the-art performance—demonstrating significantly higher success rates and strong cross-category generalization in language-driven dexterous grasping.

Technology Category

Application Category

📝 Abstract
Language-guided robot dexterous generation enables robots to grasp and manipulate objects based on human commands. However, previous data-driven methods are hard to understand intention and execute grasping with unseen categories in the open set. In this work, we explore a new task, Open-set Language-guided Dexterous Grasp, and find that the main challenge is the huge gap between high-level human language semantics and low-level robot actions. To solve this problem, we propose an Affordance Dexterous Grasp (AffordDexGrasp) framework, with the insight of bridging the gap with a new generalizable-instructive affordance representation. This affordance can generalize to unseen categories by leveraging the object's local structure and category-agnostic semantic attributes, thereby effectively guiding dexterous grasp generation. Built upon the affordance, our framework introduces Affordacne Flow Matching (AFM) for affordance generation with language as input, and Grasp Flow Matching (GFM) for generating dexterous grasp with affordance as input. To evaluate our framework, we build an open-set table-top language-guided dexterous grasp dataset. Extensive experiments in the simulation and real worlds show that our framework surpasses all previous methods in open-set generalization.
Problem

Research questions and friction points this paper is trying to address.

Bridging gap between human language and robot actions
Generalizing grasp to unseen object categories
Enhancing dexterous grasp with affordance representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalizable-instructive affordance representation bridges language-robot gap
Affordance Flow Matching generates affordance from language input
Grasp Flow Matching creates dexterous grasps using affordance input
🔎 Similar Papers
No similar papers found.
Yi-Lin Wei
Yi-Lin Wei
Sun Yat-sen University
M
Mu Lin
School of Computer Science and Engineering, Sun Yat-sen University, China
Y
Yuhao Lin
School of Computer Science and Engineering, Sun Yat-sen University, China
Jian-Jian Jiang
Jian-Jian Jiang
Sun Yat-sen University
Robotics
X
Xiao-Ming Wu
School of Computer Science and Engineering, Sun Yat-sen University, China
Ling-An Zeng
Ling-An Zeng
Sun Yat-sen University
Computer Vision
Wei-Shi Zheng
Wei-Shi Zheng
Professor @ SUN YAT-SEN UNIVERSITY
Computer VisionPattern RecognitionMachine Learning