🤖 AI Summary
This work addresses the challenges of scarce large-scale data and insufficient joint semantic-geometric reasoning in functional dexterous grasping by proposing a framework capable of generalizing to novel objects from a single human demonstration. A correspondence-driven data engine generates diverse, high-quality grasping data in simulation, while a multimodal prediction network integrates local and global features with importance-aware sampling to effectively fuse geometric and visual information. The method significantly outperforms existing approaches across multiple object categories, demonstrating strong generalization capability and high grasp success rates.
📝 Abstract
Functional grasping with dexterous robotic hands is a key capability for enabling tool use and complex manipulation, yet progress has been constrained by two persistent bottlenecks: the scarcity of large-scale datasets and the absence of integrated semantic and geometric reasoning in learned models. In this work, we present CorDex, a framework that robustly learns dexterous functional grasps of novel objects from synthetic data generated from just a single human demonstration. At the core of our approach is a correspondence-based data engine that generates diverse, high-quality training data in simulation. Based on the human demonstration, our data engine generates diverse object instances of the same category, transfers the expert grasp to the generated objects through correspondence estimation, and adapts the grasp through optimization. Building on the generated data, we introduce a multimodal prediction network that integrates visual and geometric information. By devising a local-global fusion module and an importance-aware sampling mechanism, we enable robust and computationally efficient prediction of functional dexterous grasps. Through extensive experiments across various object categories, we demonstrate that CorDex generalizes well to unseen object instances and significantly outperforms state-of-the-art baselines.