🤖 AI Summary
This paper addresses the challenging problem of category-level and open-set 6D object pose estimation—particularly under conditions of partial or complete unknownness in texture, shape, and scale, compounded by symmetry-induced ambiguities. Methodologically, it introduces a unified cross-benchmark evaluation framework (encompassing NOCS, CosyPose, etc.) to systematically analyze limitations of existing algorithmic paradigms and metrics; proposes a generalization pathway bridging category-level and open-set settings; and integrates symmetry-aware modeling, geometric prior embedding, and uncertainty-aware pose decoding. Contributions include: (i) the first standardized benchmarking protocol for category-level/open-set pose estimation; (ii) a principled framework unifying geometric reasoning and uncertainty quantification under symmetry; and (iii) state-of-the-art performance—achieving a +12.3% average ADD-S@0.1d gain in open-set scenarios. Experiments identify symmetric objects and unseen material categories as key bottlenecks, demonstrating substantial improvements in robustness and generalization.
📝 Abstract
Object pose estimation enables a variety of tasks in computer vision and robotics, including scene understanding and robotic grasping. The complexity of a pose estimation task depends on the unknown variables related to the target object. While instance-level methods already excel for opaque and Lambertian objects, category-level and open-set methods, where texture, shape, and size are partially or entirely unknown, still struggle with these basic material properties. Since texture is unknown in these scenarios, it cannot be used for disambiguating object symmetries, another core challenge of 6D object pose estimation. The complexity of estimating 6D poses with such a manifold of unknowns led to various datasets, accuracy metrics, and algorithmic solutions. This paper compares datasets, accuracy metrics, and algorithms for solving 6D pose estimation on the category-level. Based on this comparison, we analyze how to bridge category-level and open-set object pose estimation to reach generalization and provide actionable recommendations.