SECOND-Grasp: Semantic Contact-guided Dexterous Grasping

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the challenge of jointly achieving physical stability and semantic task guidance in dexterous grasping. The authors propose a unified framework that leverages vision-language reasoning to generate initial contact regions, integrates multi-view segmentation, and introduces a novel Semantic-Geometric Consistency Refinement (SGCR) mechanism to construct reliable 3D contact maps. These maps drive inverse kinematics to produce physically feasible hand poses for training reinforcement learning policies. This approach achieves the first deep integration of semantic intent and physical contact modeling, attaining grasp success rates of 98.2% and 97.7% on seen and unseen object categories, respectively, in DexGraspNet—improving intent-aware performance by 12.8% and 26.2%. Strong generalization is further validated on both Shadow Hand and Allegro Hand platforms.

📝 Abstract

Achieving reliable robotic manipulation, such as dexterous grasping, requires a synergy between physically stable interactions and semantic task guidance, yet these objectives are often treated as separate, disjoint goals. In this paper, we investigate how to integrate dexterous grasping techniques, i.e., physically stable grasps for object lifting and language-guided grasp generation, to achieve both physical stability and semantic understanding. To this end, we propose SECOND-Grasp (SEmantic CONtact-guided Dexterous Grasping), a unified framework that enables robotic hands to dynamically adjust grasping strategies based on semantic reasoning while ensuring physical feasibility. We begin by obtaining coarse contact proposals through vision-language reasoning to infer where contacts should occur based on object properties, followed by segmentation to localize these regions across views. To further ensure consistency across multiple viewpoints, we introduce Semantic-Geometric Consistency Refinement (SGCR), which refines initial contact predictions by enforcing semantic consistency across views and removing geometrically invalid regions, yielding reliable 3D contact maps. Then, we derive a feasible hand pose for each contact map via inverse kinematics, generating a supervision signal for policy learning. Our approach, trained on DexGraspNet, consistently outperforms baselines in lifting success rate on both seen and unseen categories, achieving 98.2% and 97.7%, respectively, while also improving intent-aware grasping by 12.8% and 26.2%. We further show promising results on additional datasets and robotic hands, including Shadow Hand and Allegro Hand.

Problem

Research questions and friction points this paper is trying to address.

dexterous grasping

semantic guidance

physical stability

contact prediction

robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

dexterous grasping

semantic reasoning

contact map

multi-view consistency

inverse kinematics

🔎 Similar Papers

No similar papers found.