UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses zero-shot, single-view RGB-D 6D pose estimation and 3D reconstruction for novel objects lacking CAD models. The method leverages a pre-trained multi-view diffusion model to provide geometric priors and introduces an uncertainty-guided, incremental 3D Gaussian splatting optimization framework—requiring no additional training and being model-agnostic. Pixel-wise epistemic uncertainty estimation enables robust multi-view fusion, while a joint pose graph jointly optimizes reconstruction and pose estimation. Evaluated on multiple benchmarks, the approach achieves significant improvements: +12.7% in 6D pose accuracy (ADD-S/AUC) and −28.4% in Chamfer distance for reconstruction quality. Furthermore, real-world robotic grasping experiments demonstrate its practical utility and strong generalization to unseen objects.

Technology Category

Application Category

📝 Abstract

Estimating the 6D pose of novel objects is a fundamental yet challenging problem in robotics, often relying on access to object CAD models. However, acquiring such models can be costly and impractical. Recent approaches aim to bypass this requirement by leveraging strong priors from foundation models to reconstruct objects from single or multi-view images, but typically require additional training or produce hallucinated geometry. To this end, we propose UnPose, a novel framework for zero-shot, model-free 6D object pose estimation and reconstruction that exploits 3D priors and uncertainty estimates from a pre-trained diffusion model. Specifically, starting from a single-view RGB-D frame, UnPose uses a multi-view diffusion model to estimate an initial 3D model using 3D Gaussian Splatting (3DGS) representation, along with pixel-wise epistemic uncertainty estimates. As additional observations become available, we incrementally refine the 3DGS model by fusing new views guided by the diffusion model's uncertainty, thereby continuously improving the pose estimation accuracy and 3D reconstruction quality. To ensure global consistency, the diffusion prior-generated views and subsequent observations are further integrated in a pose graph and jointly optimized into a coherent 3DGS field. Extensive experiments demonstrate that UnPose significantly outperforms existing approaches in both 6D pose estimation accuracy and 3D reconstruction quality. We further showcase its practical applicability in real-world robotic manipulation tasks.

Problem

Research questions and friction points this paper is trying to address.

Estimating 6D pose of novel objects without CAD models

Leveraging diffusion priors for zero-shot pose estimation

Refining 3D reconstruction using uncertainty-guided view fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion model uncertainty for refinement

Incremental 3D Gaussian Splatting with multi-view fusion

Pose graph optimization for global consistency

🔎 Similar Papers

No similar papers found.