ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning

πŸ“… 2025-11-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing dexterous grasping methods rely heavily on large-scale annotated datasets, limiting generalization to unseen objects and diverse task instructions. Method: We propose the first zero-shot, task-oriented grasping framework requiring no training data. It employs a multimodal large language model (MLLM) with prompt engineering for multi-stage semantic reasoning to precisely align task intent with object affordances; subsequently predicts semantically meaningful contact regions and optimizes dexterous grasp poses under physical constraints. Contribution/Results: This work establishes the first end-to-end integration of MLLMs with contact-aware grasp optimization, significantly enhancing zero-shot generalization across novel objects and complex instructions (e.g., β€œpinch the edge using thumb and index finger”). Experiments demonstrate high success rates and strong task compliance on diverse unseen objects and intricate manipulation directives, introducing a new paradigm for general-purpose intelligent grasping.

Technology Category

Application Category

πŸ“ Abstract
Task-oriented dexterous grasping holds broad application prospects in robotic manipulation and human-object interaction. However, most existing methods still struggle to generalize across diverse objects and task instructions, as they heavily rely on costly labeled data to ensure task-specific semantic alignment. In this study, we propose extbf{ZeroDexGrasp}, a zero-shot task-oriented dexterous grasp synthesis framework integrating Multimodal Large Language Models with grasp refinement to generate human-like grasp poses that are well aligned with specific task objectives and object affordances. Specifically, ZeroDexGrasp employs prompt-based multi-stage semantic reasoning to infer initial grasp configurations and object contact information from task and object semantics, then exploits contact-guided grasp optimization to refine these poses for physical feasibility and task alignment. Experimental results demonstrate that ZeroDexGrasp enables high-quality zero-shot dexterous grasping on diverse unseen object categories and complex task requirements, advancing toward more generalizable and intelligent robotic grasping.
Problem

Research questions and friction points this paper is trying to address.

Generalizing dexterous grasping across diverse objects and task instructions
Reducing reliance on costly labeled data for task-specific semantic alignment
Generating physically feasible grasp poses aligned with task objectives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot framework with multimodal large language models
Prompt-based multi-stage semantic reasoning for grasp synthesis
Contact-guided grasp optimization for physical feasibility
πŸ”Ž Similar Papers
No similar papers found.
J
Juntao Jian
Shenzhen University
Yi-Lin Wei
Yi-Lin Wei
Sun Yat-sen University
C
Chengjie Mou
Shenzhen University
Y
Yuhao Lin
Sun Yat-sen University
X
Xing Zhu
Ant Group
Yujun Shen
Yujun Shen
Ant Group
Generative ModelingComputer VisionDeep Learning
Wei-Shi Zheng
Wei-Shi Zheng
Professor @ SUN YAT-SEN UNIVERSITY
Computer VisionPattern RecognitionMachine Learning
R
Ruizhen Hu
Shenzhen University