🤖 AI Summary
To address the challenges of distribution modeling and quantifying shape uncertainty in dexterous multi-fingered grasping generation from partial point cloud observations, this paper proposes the first deep latent-variable model based on normalizing flows. Our method employs a hierarchical latent structure and exact likelihood computation to overcome mode collapse and prior misspecification inherent in conditional variational autoencoders (cVAEs), enabling introspective quantification of geometric uncertainty and identification of unobserved regions. We further integrate a discriminative grasp evaluator to enhance generation quality. Evaluated in both simulation and real-world settings, our approach significantly outperforms strong baselines—including diffusion models—achieving substantial improvements in grasp diversity, which translates to markedly higher success rates in cluttered environments and confined spaces, while maintaining efficient inference.
📝 Abstract
Synthesizing diverse dexterous grasps from uncertain partial observation is an important yet challenging task for physically intelligent embodiments. Previous works on generative grasp synthesis fell short of precisely capturing the complex grasp distribution and reasoning about shape uncertainty in the unstructured and often partially perceived reality. In this work, we introduce a novel model that can generate diverse grasps for a multi-fingered hand while introspectively handling perceptual uncertainty and recognizing unknown object geometry to avoid performance degradation. Specifically, we devise a Deep Latent Variable Model (DLVM) based on Normalizing Flows (NFs), facilitating hierarchical and expressive latent representation for modeling versatile grasps. Our model design counteracts typical pitfalls of its popular alternative in generative grasping, i.e., conditional Variational Autoencoders (cVAEs) whose performance is limited by mode collapse and miss-specified prior issues. Moreover, the resultant feature hierarchy and the exact flow likelihood computation endow our model with shape-aware introspective capabilities, enabling it to quantify the shape uncertainty of partial point clouds and detect objects of novel geometry. We further achieve performance gain by fusing this information with a discriminative grasp evaluator, facilitating a novel hybrid way for grasp evaluation. Comprehensive simulated and real-world experiments show that the proposed idea gains superior performance and higher run-time efficiency against strong baselines, including diffusion models. We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.