π€ AI Summary
In cluttered scenes, dexterous hands struggle to simultaneously achieve object singulation and stable grasping.
Method: We propose a unified vision-action policy framework that actively exploits the dexterous handβs high-degree-of-freedom motion for object rearrangement to facilitate singulation. We introduce a clutter-level progressive curriculum learning scheme and employ policy distillation to achieve model lightweighting and strong generalization. Our approach integrates deep reinforcement learning, point-cloud- and RGB-D-driven sim-to-real transfer, and an end-to-end vision-action joint policy network.
Contribution/Results: Experiments demonstrate over 25% improvement in grasping success rate compared to baselines on tasks involving multi-level occlusion and dense stacking. The framework supports real-time, vision-driven deployment and significantly enhances operational robustness and efficiency in complex cluttered environments.
π Abstract
Grasping objects in cluttered environments remains a fundamental yet challenging problem in robotic manipulation. While prior works have explored learning-based synergies between pushing and grasping for two-fingered grippers, few have leveraged the high degrees of freedom (DoF) in dexterous hands to perform efficient singulation for grasping in cluttered settings. In this work, we introduce DexSinGrasp, a unified policy for dexterous object singulation and grasping. DexSinGrasp enables high-dexterity object singulation to facilitate grasping, significantly improving efficiency and effectiveness in cluttered environments. We incorporate clutter arrangement curriculum learning to enhance success rates and generalization across diverse clutter conditions, while policy distillation enables a deployable vision-based grasping strategy. To evaluate our approach, we introduce a set of cluttered grasping tasks with varying object arrangements and occlusion levels. Experimental results show that our method outperforms baselines in both efficiency and grasping success rate, particularly in dense clutter. Codes, appendix, and videos are available on our project website https://nus-lins-lab.github.io/dexsingweb/.