🤖 AI Summary
This work addresses the challenges of dexterous robotic grasping in cluttered environments, where high degrees of freedom, occlusions, and inter-object collisions complicate reliable manipulation. The authors propose a two-stage approach: first, a sparse Implicit Boundary Surface (IBS) representation is predicted from a single-view point cloud, decoupling scene geometry and explicitly modeling contact and collision constraints; second, a stable, collision-free grasp pose is generated by optimizing an energy function. The method integrates an occupancy diffusion model, voxel-level conditional guidance, and force-closure-based scoring to enhance the quality of high-dimensional grasp configurations. Experiments demonstrate that the proposed approach significantly reduces collision rates while maintaining high grasp success across diverse objects and complex real-world and simulated scenes.
📝 Abstract
Dexterous grasping in cluttered environments presents substantial challenges due to the high degrees of freedom of dexterous hands, occlusion, and potential collisions arising from diverse object geometries and complex layouts. To address these challenges, we propose CADGrasp, a two-stage algorithm for general dexterous grasping using single-view point cloud inputs. In the first stage, we predict sparse IBS, a scene-decoupled, contact- and collision-aware representation, as the optimization target. Sparse IBS compactly encodes the geometric and contact relationships between the dexterous hand and the scene, enabling stable and collision-free dexterous grasp pose optimization. To enhance the prediction of this high-dimensional representation, we introduce an occupancy-diffusion model with voxel-level conditional guidance and force closure score filtering. In the second stage, we develop several energy functions and ranking strategies for optimization based on sparse IBS to generate high-quality dexterous grasp poses. Extensive experiments in both simulated and real-world settings validate the effectiveness of our approach, demonstrating its capability to mitigate collisions while maintaining a high grasp success rate across diverse objects and complex scenes.