🤖 AI Summary
This paper addresses the challenging problem of estimating the 6D pose and metric scale of an unknown object from a single RGB-D image—without relying on any prior 3D models. We propose the first model-agnostic, single-image-driven framework for joint 2D–3D alignment and metric scale estimation, integrating differentiable rendering, iterative pose hypothesis generation, and geometry-appearance co-optimization to handle severe occlusion, cross-scene generalization, and illumination variations. Crucially, our method eliminates dependence on CAD models or category-level priors, enabling truly zero-shot joint pose and size estimation for novel objects. Evaluated on five standard benchmarks—including REAL275 and HO3D—our approach significantly outperforms state-of-the-art methods, especially in zero-shot 6D pose estimation for unseen objects. The results establish a new paradigm for open-world 6D pose estimation.
📝 Abstract
We introduce Any6D, a model-free framework for 6D object pose estimation that requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. Unlike existing methods that rely on textured 3D models or multiple viewpoints, Any6D leverages a joint object alignment process to enhance 2D-3D alignment and metric scale estimation for improved pose accuracy. Our approach integrates a render-and-compare strategy to generate and refine pose hypotheses, enabling robust performance in scenarios with occlusions, non-overlapping views, diverse lighting conditions, and large cross-environment variations. We evaluate our method on five challenging datasets: REAL275, Toyota-Light, HO3D, YCBINEOAT, and LM-O, demonstrating its effectiveness in significantly outperforming state-of-the-art methods for novel object pose estimation. Project page: https://taeyeop.com/any6d