ZeroDexGrasp: Zero-Shot Task-Oriented Dexterous Grasp Synthesis with Prompt-Based Multi-Stage Semantic Reasoning

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing dexterous grasping methods rely heavily on large-scale annotated datasets, limiting generalization to unseen objects and diverse task instructions. Method: We propose the first zero-shot, task-oriented grasping framework requiring no training data. It employs a multimodal large language model (MLLM) with prompt engineering for multi-stage semantic reasoning to precisely align task intent with object affordances; subsequently predicts semantically meaningful contact regions and optimizes dexterous grasp poses under physical constraints. Contribution/Results: This work establishes the first end-to-end integration of MLLMs with contact-aware grasp optimization, significantly enhancing zero-shot generalization across novel objects and complex instructions (e.g., “pinch the edge using thumb and index finger”). Experiments demonstrate high success rates and strong task compliance on diverse unseen objects and intricate manipulation directives, introducing a new paradigm for general-purpose intelligent grasping.

Technology Category

Application Category

📝 Abstract

Task-oriented dexterous grasping holds broad application prospects in robotic manipulation and human-object interaction. However, most existing methods still struggle to generalize across diverse objects and task instructions, as they heavily rely on costly labeled data to ensure task-specific semantic alignment. In this study, we propose extbf{ZeroDexGrasp}, a zero-shot task-oriented dexterous grasp synthesis framework integrating Multimodal Large Language Models with grasp refinement to generate human-like grasp poses that are well aligned with specific task objectives and object affordances. Specifically, ZeroDexGrasp employs prompt-based multi-stage semantic reasoning to infer initial grasp configurations and object contact information from task and object semantics, then exploits contact-guided grasp optimization to refine these poses for physical feasibility and task alignment. Experimental results demonstrate that ZeroDexGrasp enables high-quality zero-shot dexterous grasping on diverse unseen object categories and complex task requirements, advancing toward more generalizable and intelligent robotic grasping.

Problem

Research questions and friction points this paper is trying to address.

Generalizing dexterous grasping across diverse objects and task instructions

Reducing reliance on costly labeled data for task-specific semantic alignment

Generating physically feasible grasp poses aligned with task objectives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot framework with multimodal large language models

Prompt-based multi-stage semantic reasoning for grasp synthesis

Contact-guided grasp optimization for physical feasibility

🔎 Similar Papers

No similar papers found.