GraspLDP: Towards Generalizable Grasping Policy via Latent Diffusion

๐Ÿ“… 2026-02-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of imitation learningโ€“based grasping policies, which often suffer from low precision and poor generalization across object geometries and environments. To overcome these challenges, the authors propose a latent diffusion policy that integrates grasp priors into the generative process. Specifically, grasp pose priors are incorporated during the reverse diffusion process, while a self-supervised reconstruction objective based on wrist-mounted camera images guides action decoding and implicitly embeds graspability priors. By innovatively combining geometric priors with diffusion models, the method significantly enhances dynamic grasping accuracy and cross-scenario, cross-object generalization. Extensive experiments in both simulation and real-world robotic settings demonstrate that the proposed approach outperforms existing baselines across multiple performance metrics.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper focuses on enhancing the grasping precision and generalization of manipulation policies learned via imitation learning. Diffusion-based policy learning methods have recently become the mainstream approach for robotic manipulation tasks. As grasping is a critical subtask in manipulation, the ability of imitation-learned policies to execute precise and generalizable grasps merits particular attention. Existing imitation learning techniques for grasping often suffer from imprecise grasp executions, limited spatial generalization, and poor object generalization. To address these challenges, we incorporate grasp prior knowledge into the diffusion policy framework. In particular, we employ a latent diffusion policy to guide action chunk decoding with grasp pose prior, ensuring that generated motion trajectories adhere closely to feasible grasp configurations. Furthermore, we introduce a self-supervised reconstruction objective during diffusion to embed the graspness prior: at each reverse diffusion step, we reconstruct wrist-camera images back-projected the graspness from the intermediate representations. Both simulation and real robot experiments demonstrate that our approach significantly outperforms baseline methods and exhibits strong dynamic grasping capabilities.
Problem

Research questions and friction points this paper is trying to address.

grasping
imitation learning
generalization
precision
robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent diffusion
grasp prior
imitation learning
self-supervised reconstruction
grasp generalization
E
Enda Xiang
State Key Laboratory of Complex and Critical Software Environment, Beihang University, Beijing, 100191, China; School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Haoxiang Ma
Haoxiang Ma
Beihang University
GraspingRobotic Manipulation
Xinzhu Ma
Xinzhu Ma
Associate Professor, Beihang University
deep learningcomputer vision3D scene understandingai4science
Z
Zicheng Liu
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Di Huang
Di Huang
Computer Science and Engineering, Beihang University
Computer VisionRepresentation LearningGenerative AIEmbodied AI