š¤ AI Summary
In 6DoF object pose estimation, inaccurate localization of occluded keypoints degrades 3Dā2D correspondence quality. To address this, we propose VAPO, a visibility-aware keypoint localization framework. VAPO innovatively generates binary visibility labels automatically from object-level annotations and models sequential keypoint importance via PageRank. We design a lightweight keypoint localization network that integrates positional encoding with differentiable pose optimization, supporting both CAD-dependent and CAD-free scenarios. Evaluated on Linemod, Linemod-Occlusion, and YCB-V, VAPO achieves state-of-the-art performance in keypoint correspondence accuracy and pose estimation, particularly under severe occlusion and cluttered backgrounds. Our method significantly improves robustness in challenging real-world conditions while maintaining computational efficiency.
š Abstract
Localizing predefined 3D keypoints in a 2D image is an effective way to establish 3D-2D correspondences for 6DoF object pose estimation. However, unreliable localization results of invisible keypoints degrade the quality of correspondences. In this paper, we address this issue by localizing the important keypoints in terms of visibility. Since keypoint visibility information is currently missing in the dataset collection process, we propose an efficient way to generate binary visibility labels from available object-level annotations, for keypoints of both asymmetric objects and symmetric objects. We further derive real-valued visibility-aware importance from binary labels based on the PageRank algorithm. Taking advantage of the flexibility of our visibility-aware importance, we construct VAPO (Visibility-Aware POse estimator) by integrating the visibility-aware importance with a state-of-the-art pose estimation algorithm, along with additional positional encoding. VAPO can work in both CAD-based and CAD-free settings. Extensive experiments are conducted on popular pose estimation benchmarks including Linemod, Linemod-Occlusion, and YCB-V. The results show that, VAPO significantly improves both the keypoint correspondences and final estimated poses, and clearly achieves state-of-the-art performances.