🤖 AI Summary
This work addresses zero-shot 6D object pose estimation for robotic real-time manipulation in the absence of CAD models or reference images. We propose HIPPo Dreamer, a novel method that integrates multi-view diffusion priors with an online-observation-driven dynamic mesh evolution mechanism, enabling second-level 3D mesh generation from a single input image and continuous joint geometric-appearance optimization. Crucially, our approach requires no pre-defined object models or reference views, achieving— for the first time—model-agnostic, zero-shot, real-time (sub-3 seconds per frame) 6D pose estimation and geometric tracking in a unified framework. Extensive evaluations on multiple benchmarks demonstrate significant improvements over state-of-the-art methods, particularly under reference-image-scarce conditions, where pose accuracy is markedly enhanced. The method has been successfully deployed in closed-loop robotic control tasks.
📝 Abstract
This work focuses on model-free zero-shot 6D object pose estimation for robotics applications. While existing methods can estimate the precise 6D pose of objects, they heavily rely on curated CAD models or reference images, the preparation of which is a time-consuming and labor-intensive process. Moreover, in real-world scenarios, 3D models or reference images may not be available in advance and instant robot reaction is desired. In this work, we propose a novel framework named HIPPo, which eliminates the need for curated CAD models and reference images by harnessing image-to-3D priors from Diffusion Models, enabling model-free zero-shot 6D pose estimation. Specifically, we construct HIPPo Dreamer, a rapid image-to-mesh model built on a multiview Diffusion Model and a 3D reconstruction foundation model. Our HIPPo Dreamer can generate a 3D mesh of any unseen objects from a single glance in just a few seconds. Then, as more observations are acquired, we propose to continuously refine the diffusion prior mesh model by joint optimization of object geometry and appearance. This is achieved by a measurement-guided scheme that gradually replaces the plausible diffusion priors with more reliable online observations. Consequently, HIPPo can instantly estimate and track the 6D pose of a novel object and maintain a complete mesh for immediate robotic applications. Thorough experiments on various benchmarks show that HIPPo outperforms state-of-the-art methods in 6D object pose estimation when prior reference images are limited.