WALDO: Where Unseen Model-based 6D Pose Estimation Meets Occlusion

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenges of strong CAD model dependency and severe multi-stage error propagation in 6D pose estimation of unseen objects under occlusion. We propose a model-driven, end-to-end framework. Key contributions include: (1) a dynamic non-uniform sampling strategy coupled with visibility-aware multi-hypothesis generation; (2) an iterative refinement-based joint optimization scheme for pose and visibility estimation; and (3) occlusion-oriented data augmentation and a visibility-weighted evaluation metric. Crucially, our method achieves significant robustness to occlusion without requiring test-time CAD re-optimization. On the ICBIN and BOP benchmarks, it improves ADD(-S) accuracy by 5.2% and 2.3%, respectively, while accelerating inference by 3.1×. The overall performance surpasses current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Accurate 6D object pose estimation is vital for robotics, augmented reality, and scene understanding. For seen objects, high accuracy is often attainable via per-object fine-tuning but generalizing to unseen objects remains a challenge. To address this problem, past arts assume access to CAD models at test time and typically follow a multi-stage pipeline to estimate poses: detect and segment the object, propose an initial pose, and then refine it. Under occlusion, however, the early-stage of such pipelines are prone to errors, which can propagate through the sequential processing, and consequently degrade the performance. To remedy this shortcoming, we propose four novel extensions to model-based 6D pose estimation methods: (i) a dynamic non-uniform dense sampling strategy that focuses computation on visible regions, reducing occlusion-induced errors; (ii) a multi-hypothesis inference mechanism that retains several confidence-ranked pose candidates, mitigating brittle single-path failures; (iii) iterative refinement to progressively improve pose accuracy; and (iv) series of occlusion-focused training augmentations that strengthen robustness and generalization. Furthermore, we propose a new weighted by visibility metric for evaluation under occlusion to minimize the bias in the existing protocols. Via extensive empirical evaluations, we show that our proposed approach achieves more than 5% improvement in accuracy on ICBIN and more than 2% on BOP dataset benchmarks, while achieving approximately 3 times faster inference.
Problem

Research questions and friction points this paper is trying to address.

Generalizing 6D pose estimation to unseen objects under occlusion
Addressing error propagation in multi-stage pose estimation pipelines
Reducing occlusion-induced failures in model-based pose estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic non-uniform dense sampling for visible regions
Multi-hypothesis inference with confidence-ranked candidates
Occlusion-focused training augmentations for robustness
🔎 Similar Papers
No similar papers found.