ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Dexterous grasping in cluttered scenes remains challenging due to high geometric diversity of objects, severe occlusions, and collision risks. Method: This paper introduces the first zero-shot closed-loop sim-to-real dexterous grasping framework. (1) It designs a curriculum-learning-based teacher policy guided by clutter density and augmented with spatial-geometric embedding representations. (2) It develops a safety-aware 3D diffusion student policy (DP3), trained via imitation learning to distill teacher knowledge using partial point-cloud observations as input. (3) It enables real-time closed-loop control conditioned on live point-cloud feedback. Contribution/Results: The framework achieves zero-shot cross-domain deployment—no real-world fine-tuning required. It significantly improves robustness across diverse objects and layouts, outperforming state-of-the-art single-object and open-loop methods in real-world cluttered scenarios, with higher grasp success rates.

Technology Category

Application Category

📝 Abstract

Dexterous grasping in cluttered scenes presents significant challenges due to diverse object geometries, occlusions, and potential collisions. Existing methods primarily focus on single-object grasping or grasp-pose prediction without interaction, which are insufficient for complex, cluttered scenes. Recent vision-language-action models offer a potential solution but require extensive real-world demonstrations, making them costly and difficult to scale. To address these limitations, we revisit the sim-to-real transfer pipeline and develop key techniques that enable zero-shot deployment in reality while maintaining robust generalization. We propose ClutterDexGrasp, a two-stage teacher-student framework for closed-loop target-oriented dexterous grasping in cluttered scenes. The framework features a teacher policy trained in simulation using clutter density curriculum learning, incorporating both a novel geometry and spatially-embedded scene representation and a comprehensive safety curriculum, enabling general, dynamic, and safe grasping behaviors. Through imitation learning, we distill the teacher's knowledge into a student 3D diffusion policy (DP3) that operates on partial point cloud observations. To the best of our knowledge, this represents the first zero-shot sim-to-real closed-loop system for target-oriented dexterous grasping in cluttered scenes, demonstrating robust performance across diverse objects and layouts. More details and videos are available at https://clutterdexgrasp.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Addressing dexterous grasping in cluttered scenes with diverse objects

Overcoming limitations of single-object grasping methods in complex scenes

Enabling zero-shot sim-to-real transfer for robust grasping performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage teacher-student framework for grasping

Geometry and spatially-embedded scene representation

Zero-shot sim-to-real closed-loop system

🔎 Similar Papers

No similar papers found.