ClutterDexGrasp: A Sim-to-Real System for General Dexterous Grasping in Cluttered Scenes

๐Ÿ“… 2025-06-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Dexterous grasping in cluttered scenes remains challenging due to high geometric diversity of objects, severe occlusions, and collision risks. Method: This paper introduces the first zero-shot closed-loop sim-to-real dexterous grasping framework. (1) It designs a curriculum-learning-based teacher policy guided by clutter density and augmented with spatial-geometric embedding representations. (2) It develops a safety-aware 3D diffusion student policy (DP3), trained via imitation learning to distill teacher knowledge using partial point-cloud observations as input. (3) It enables real-time closed-loop control conditioned on live point-cloud feedback. Contribution/Results: The framework achieves zero-shot cross-domain deploymentโ€”no real-world fine-tuning required. It significantly improves robustness across diverse objects and layouts, outperforming state-of-the-art single-object and open-loop methods in real-world cluttered scenarios, with higher grasp success rates.

Technology Category

Application Category

๐Ÿ“ Abstract
Dexterous grasping in cluttered scenes presents significant challenges due to diverse object geometries, occlusions, and potential collisions. Existing methods primarily focus on single-object grasping or grasp-pose prediction without interaction, which are insufficient for complex, cluttered scenes. Recent vision-language-action models offer a potential solution but require extensive real-world demonstrations, making them costly and difficult to scale. To address these limitations, we revisit the sim-to-real transfer pipeline and develop key techniques that enable zero-shot deployment in reality while maintaining robust generalization. We propose ClutterDexGrasp, a two-stage teacher-student framework for closed-loop target-oriented dexterous grasping in cluttered scenes. The framework features a teacher policy trained in simulation using clutter density curriculum learning, incorporating both a novel geometry and spatially-embedded scene representation and a comprehensive safety curriculum, enabling general, dynamic, and safe grasping behaviors. Through imitation learning, we distill the teacher's knowledge into a student 3D diffusion policy (DP3) that operates on partial point cloud observations. To the best of our knowledge, this represents the first zero-shot sim-to-real closed-loop system for target-oriented dexterous grasping in cluttered scenes, demonstrating robust performance across diverse objects and layouts. More details and videos are available at https://clutterdexgrasp.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Addressing dexterous grasping in cluttered scenes with diverse objects
Overcoming limitations of single-object grasping methods in complex scenes
Enabling zero-shot sim-to-real transfer for robust grasping performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage teacher-student framework for grasping
Geometry and spatially-embedded scene representation
Zero-shot sim-to-real closed-loop system
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zeyuan Chen
CFCS, School of Computer Science, Peking University; PKU-AgiBot Lab
Q
Qiyang Yan
CFCS, School of Computer Science, Peking University; PKU-AgiBot Lab
Yuanpei Chen
Yuanpei Chen
South China University of Technology
Robotic
T
Tianhao Wu
CFCS, School of Computer Science, Peking University; PKU-AgiBot Lab
Jiyao Zhang
Jiyao Zhang
Peking University
Embodied AIRobotics3D Vision
Z
Zihan Ding
Princeton University
Jinzhou Li
Jinzhou Li
Duke University
RoboticsDeep Reinforcement LearningManipulation
Y
Yaodong Yang
CFCS, School of Computer Science, Peking University; PKU-PsiBot Lab
H
Hao Dong
CFCS, School of Computer Science, Peking University; PKU-AgiBot Lab