🤖 AI Summary
This work addresses two key challenges in semantic dexterous grasping: poor generalization across objects and tasks, and the misalignment between foundation model knowledge and robotic execution. Methodologically, we propose a unified framework integrating vision-language foundation models with force-feedback closed-loop control. First, a multimodal large model generates human grasping images, enabling cross-morphology action mapping via human-to-robot imitation learning. Second, we introduce a force-aware adaptive grasping policy, enhanced by sim-to-real transfer, to support diverse user instructions, dexterous hand configurations, and task types. Evaluated in both simulation and on real robotic platforms, our approach demonstrates significant improvements in cross-object, cross-task, and cross-hardware generalization. Moreover, it exhibits strong scalability to complex manipulation tasks, bridging the gap between high-level semantic understanding and low-level robotic control.
📝 Abstract
Enabling robots to dexterously grasp and manipulate objects based on human commands is a promising direction in robotics. However, existing approaches are challenging to generalize across diverse objects or tasks due to the limited scale of semantic dexterous grasp datasets. Foundation models offer a new way to enhance generalization, yet directly leveraging them to generate feasible robotic actions remains challenging due to the gap between abstract model knowledge and physical robot execution. To address these challenges, we propose OmniDexGrasp, a generalizable framework that achieves omni-capabilities in user prompting, dexterous embodiment, and grasping tasks by combining foundation models with the transfer and control strategies. OmniDexGrasp integrates three key modules: (i) foundation models are used to enhance generalization by generating human grasp images supporting omni-capability of user prompt and task; (ii) a human-image-to-robot-action transfer strategy converts human demonstrations into executable robot actions, enabling omni dexterous embodiment; (iii) force-aware adaptive grasp strategy ensures robust and stable grasp execution. Experiments in simulation and on real robots validate the effectiveness of OmniDexGrasp on diverse user prompts, grasp task and dexterous hands, and further results show its extensibility to dexterous manipulation tasks.