🤖 AI Summary
Current 6D pose estimation benchmarks oversimplify visual ambiguities—such as symmetry and occlusion—as global object symmetries, neglecting image-level, viewpoint-dependent visibility variations, thereby misrepresenting real-world pose uncertainty. To address this, we propose a novel image-level pose distribution evaluation paradigm: (1) the first automatic pose distribution annotation method grounded in single-image surface visibility; (2) BOP-Dist, the first pose distribution benchmark tailored to realistic images; and (3) a symmetry-aware sampling strategy coupled with a distribution-aware accuracy/recall evaluation framework. After re-annotating all BOP datasets with pose distributions, we observe substantial corrections to the performance ranking of state-of-the-art single-solution methods—revealing their rankings to be highly sensitive to annotation granularity. This work establishes a physically interpretable, reproducible, and quantitative evaluation standard for multi-solution pose estimation.
📝 Abstract
6D pose estimation aims at determining the object pose that best explains the camera observation. The unique solution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or when occlusions of symmetry-breaking elements happen, depending on the viewpoint. Currently, 6D pose estimation methods are benchmarked on datasets that consider, for their ground truth annotations, visual ambiguities as only related to global object symmetries, whereas they should be defined per-image to account for the camera viewpoint. We thus first propose an automatic method to re-annotate those datasets with a 6D pose distribution specific to each image, taking into account the object surface visibility in the image to correctly determine the visual ambiguities. Second, given this improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifies the ranking of these methods. Third, as some recent works focus on estimating the complete set of solutions, we derive a precision/recall formulation to evaluate them against our image-wise distribution ground truth, making it the first benchmark for pose distribution methods on real images.