SECOND-Grasp: Semantic Contact-guided Dexterous Grasping

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

214K/year
🤖 AI Summary
This work addresses the challenge of jointly achieving physical stability and semantic task guidance in dexterous grasping. The authors propose a unified framework that leverages vision-language reasoning to generate initial contact regions, integrates multi-view segmentation, and introduces a novel Semantic-Geometric Consistency Refinement (SGCR) mechanism to construct reliable 3D contact maps. These maps drive inverse kinematics to produce physically feasible hand poses for training reinforcement learning policies. This approach achieves the first deep integration of semantic intent and physical contact modeling, attaining grasp success rates of 98.2% and 97.7% on seen and unseen object categories, respectively, in DexGraspNet—improving intent-aware performance by 12.8% and 26.2%. Strong generalization is further validated on both Shadow Hand and Allegro Hand platforms.
📝 Abstract
Achieving reliable robotic manipulation, such as dexterous grasping, requires a synergy between physically stable interactions and semantic task guidance, yet these objectives are often treated as separate, disjoint goals. In this paper, we investigate how to integrate dexterous grasping techniques, i.e., physically stable grasps for object lifting and language-guided grasp generation, to achieve both physical stability and semantic understanding. To this end, we propose SECOND-Grasp (SEmantic CONtact-guided Dexterous Grasping), a unified framework that enables robotic hands to dynamically adjust grasping strategies based on semantic reasoning while ensuring physical feasibility. We begin by obtaining coarse contact proposals through vision-language reasoning to infer where contacts should occur based on object properties, followed by segmentation to localize these regions across views. To further ensure consistency across multiple viewpoints, we introduce Semantic-Geometric Consistency Refinement (SGCR), which refines initial contact predictions by enforcing semantic consistency across views and removing geometrically invalid regions, yielding reliable 3D contact maps. Then, we derive a feasible hand pose for each contact map via inverse kinematics, generating a supervision signal for policy learning. Our approach, trained on DexGraspNet, consistently outperforms baselines in lifting success rate on both seen and unseen categories, achieving 98.2% and 97.7%, respectively, while also improving intent-aware grasping by 12.8% and 26.2%. We further show promising results on additional datasets and robotic hands, including Shadow Hand and Allegro Hand.
Problem

Research questions and friction points this paper is trying to address.

dexterous grasping
semantic guidance
physical stability
contact prediction
robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dexterous grasping
semantic reasoning
contact map
multi-view consistency
inverse kinematics
🔎 Similar Papers
No similar papers found.