OmniDexGrasp: Generalizable Dexterous Grasping via Foundation Model and Force Feedback

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two key challenges in semantic dexterous grasping: poor generalization across objects and tasks, and the misalignment between foundation model knowledge and robotic execution. Methodologically, we propose a unified framework integrating vision-language foundation models with force-feedback closed-loop control. First, a multimodal large model generates human grasping images, enabling cross-morphology action mapping via human-to-robot imitation learning. Second, we introduce a force-aware adaptive grasping policy, enhanced by sim-to-real transfer, to support diverse user instructions, dexterous hand configurations, and task types. Evaluated in both simulation and on real robotic platforms, our approach demonstrates significant improvements in cross-object, cross-task, and cross-hardware generalization. Moreover, it exhibits strong scalability to complex manipulation tasks, bridging the gap between high-level semantic understanding and low-level robotic control.

Technology Category

Application Category

📝 Abstract
Enabling robots to dexterously grasp and manipulate objects based on human commands is a promising direction in robotics. However, existing approaches are challenging to generalize across diverse objects or tasks due to the limited scale of semantic dexterous grasp datasets. Foundation models offer a new way to enhance generalization, yet directly leveraging them to generate feasible robotic actions remains challenging due to the gap between abstract model knowledge and physical robot execution. To address these challenges, we propose OmniDexGrasp, a generalizable framework that achieves omni-capabilities in user prompting, dexterous embodiment, and grasping tasks by combining foundation models with the transfer and control strategies. OmniDexGrasp integrates three key modules: (i) foundation models are used to enhance generalization by generating human grasp images supporting omni-capability of user prompt and task; (ii) a human-image-to-robot-action transfer strategy converts human demonstrations into executable robot actions, enabling omni dexterous embodiment; (iii) force-aware adaptive grasp strategy ensures robust and stable grasp execution. Experiments in simulation and on real robots validate the effectiveness of OmniDexGrasp on diverse user prompts, grasp task and dexterous hands, and further results show its extensibility to dexterous manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

Achieving generalizable dexterous grasping across diverse objects and tasks
Bridging foundation model knowledge gaps to physical robot execution
Ensuring robust grasp stability using force feedback and adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Foundation models generate human grasp images
Human-image-to-robot-action transfer strategy
Force-aware adaptive grasp strategy ensures stability
🔎 Similar Papers
No similar papers found.
Yi-Lin Wei
Yi-Lin Wei
Sun Yat-sen University
Z
Zhexi Luo
School of Computer Science and Engineering, Sun Yat-sen University, China
Y
Yuhao Lin
School of Computer Science and Engineering, Sun Yat-sen University, China
M
Mu Lin
School of Computer Science and Engineering, Sun Yat-sen University, China
Z
Zhizhao Liang
School of Computer Science and Engineering, Sun Yat-sen University, China
S
Shuoyu Chen
School of Computer Science and Engineering, Sun Yat-sen University, China
Wei-Shi Zheng
Wei-Shi Zheng
Professor @ SUN YAT-SEN UNIVERSITY
Computer VisionPattern RecognitionMachine Learning