Variational Shape Inference for Grasp Diffusion on SE(3)

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the limited robustness of multimodal grasp synthesis under point cloud sparsity and shape noise, this paper proposes a SE(3) diffusion-based grasp generation framework grounded in variational shape inference. Methodologically: (i) a variational autoencoder learns implicit neural shape representations to enhance geometric feature robustness; (ii) a diffusion model operating directly on the SE(3) manifold enables geometry-aware grasp pose generation; (iii) test-time grasp optimization facilitates zero-shot transfer without task-specific retraining. Evaluated on the ACRONYM dataset, our method outperforms state-of-the-art approaches by 6.3% in grasp success rate. In real-world household scenarios, it achieves a 1.34× improvement over baseline methods. The framework demonstrates significantly enhanced resilience to both point cloud sparsity and shape perturbations, while maintaining practical deployability for robotic manipulation tasks.

Technology Category

Application Category

📝 Abstract

Grasp synthesis is a fundamental task in robotic manipulation which usually has multiple feasible solutions. Multimodal grasp synthesis seeks to generate diverse sets of stable grasps conditioned on object geometry, making the robust learning of geometric features crucial for success. To address this challenge, we propose a framework for learning multimodal grasp distributions that leverages variational shape inference to enhance robustness against shape noise and measurement sparsity. Our approach first trains a variational autoencoder for shape inference using implicit neural representations, and then uses these learned geometric features to guide a diffusion model for grasp synthesis on the SE(3) manifold. Additionally, we introduce a test-time grasp optimization technique that can be integrated as a plugin to further enhance grasping performance. Experimental results demonstrate that our shape inference for grasp synthesis formulation outperforms state-of-the-art multimodal grasp synthesis methods on the ACRONYM dataset by 6.3%, while demonstrating robustness to deterioration in point cloud density compared to other approaches. Furthermore, our trained model achieves zero-shot transfer to real-world manipulation of household objects, generating 34% more successful grasps than baselines despite measurement noise and point cloud calibration errors.

Problem

Research questions and friction points this paper is trying to address.

Learning multimodal grasp distributions for robotic manipulation

Enhancing robustness against shape noise and measurement sparsity

Generating diverse stable grasps conditioned on object geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational autoencoder for shape inference

Diffusion model for grasp synthesis

Test-time grasp optimization technique

🔎 Similar Papers

Stable Object Placement Under Geometric Uncertainty via Differentiable Contact Dynamics