🤖 AI Summary
This work addresses the limitations of existing data-driven dexterous grasping methods, which rely on expensive and morphologically constrained datasets, and analytical synthesis approaches that often produce physically infeasible grasps due to oversimplified assumptions, leading to severe diversity loss after high-fidelity simulation filtering. The authors propose a scalable generate-and-optimize framework that repurposes high-fidelity simulation from a passive filter into an active optimization stage. By employing an asynchronous, gradient-free evolutionary algorithm, the method continuously refines analytically generated initial grasps—enhancing physical feasibility while preserving diversity, without requiring differentiable objectives. Furthermore, robust deployment is achieved through distillation into a diffusion model. Evaluated on Handles and DexGraspNet, the approach generates over 120 stable grasps per object, achieving 1.7–6× higher success rates than unoptimized analytical methods and 46–60% greater coverage of unique grasps compared to diffusion-based baselines.
📝 Abstract
Dexterous grasping is fundamental to robotics, yet data-driven grasp prediction heavily relies on large, diverse datasets that are costly to generate and typically limited to a narrow set of gripper morphologies. Analytical grasp synthesis can be used to scale data collection, but necessary simplifying assumptions often yield physically infeasible grasps that need to be filtered in high-fidelity simulators, significantly reducing the total number of grasps and their diversity.
We propose a scalable generate-and-refine pipeline for synthesizing large-scale, diverse, and physically feasible grasps. Instead of using high-fidelity simulators solely for verification and filtering, we leverage them as an optimization stage that continuously improves grasp quality without discarding precomputed candidates. More specifically, we initialize an evolutionary search with a seed set of analytically generated, potentially suboptimal grasps. We then refine these proposals directly in a high-fidelity simulator (Isaac Sim) using an asynchronous, gradient-free evolutionary algorithm, improving stability while maintaining diversity. In addition, this refinement stage can be guided toward human preferences and/or domain-specific quality metrics without requiring a differentiable objective. We further distill the refined grasp distribution into a diffusion model for robust real-world deployment, and highlight the role of diversity for both effective training and during deployment. Experiments on a newly introduced Handles dataset and a DexGraspNet subset demonstrate that our approach achieves over 120 distinct stable grasps per object (a 1.7-6x improvement over unrefined analytical methods) while outperforming diffusion-based alternatives by 46-60\% in unique grasp coverage.