OmniDexGrasp: Generalizable Dexterous Grasping via Foundation Model and Force Feedback

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses two key challenges in semantic dexterous grasping: poor generalization across objects and tasks, and the misalignment between foundation model knowledge and robotic execution. Methodologically, we propose a unified framework integrating vision-language foundation models with force-feedback closed-loop control. First, a multimodal large model generates human grasping images, enabling cross-morphology action mapping via human-to-robot imitation learning. Second, we introduce a force-aware adaptive grasping policy, enhanced by sim-to-real transfer, to support diverse user instructions, dexterous hand configurations, and task types. Evaluated in both simulation and on real robotic platforms, our approach demonstrates significant improvements in cross-object, cross-task, and cross-hardware generalization. Moreover, it exhibits strong scalability to complex manipulation tasks, bridging the gap between high-level semantic understanding and low-level robotic control.

Technology Category

Application Category

📝 Abstract

Enabling robots to dexterously grasp and manipulate objects based on human commands is a promising direction in robotics. However, existing approaches are challenging to generalize across diverse objects or tasks due to the limited scale of semantic dexterous grasp datasets. Foundation models offer a new way to enhance generalization, yet directly leveraging them to generate feasible robotic actions remains challenging due to the gap between abstract model knowledge and physical robot execution. To address these challenges, we propose OmniDexGrasp, a generalizable framework that achieves omni-capabilities in user prompting, dexterous embodiment, and grasping tasks by combining foundation models with the transfer and control strategies. OmniDexGrasp integrates three key modules: (i) foundation models are used to enhance generalization by generating human grasp images supporting omni-capability of user prompt and task; (ii) a human-image-to-robot-action transfer strategy converts human demonstrations into executable robot actions, enabling omni dexterous embodiment; (iii) force-aware adaptive grasp strategy ensures robust and stable grasp execution. Experiments in simulation and on real robots validate the effectiveness of OmniDexGrasp on diverse user prompts, grasp task and dexterous hands, and further results show its extensibility to dexterous manipulation tasks.

Problem

Research questions and friction points this paper is trying to address.

Achieving generalizable dexterous grasping across diverse objects and tasks

Bridging foundation model knowledge gaps to physical robot execution

Ensuring robust grasp stability using force feedback and adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Foundation models generate human grasp images

Human-image-to-robot-action transfer strategy

Force-aware adaptive grasp strategy ensures stability

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey