🤖 AI Summary
To address the dual requirements of data efficiency and localization accuracy for camera pose estimation in retail environments, this paper proposes a relative pose regression method based on a Pose Autoencoder (PAE). Methodologically, we are the first to adapt PAE to relative pose estimation, jointly modeling scene spatial priors and visual features within an end-to-end framework that performs both relative pose regression and absolute pose refinement. We further design a lightweight relocalization strategy requiring no additional storage of images or poses. Our key contributions are: (1) a learnable pose prior representation mechanism; and (2) highly efficient pose refinement achieving state-of-the-art performance using only 30% of the training data. Evaluated on indoor benchmarks, our approach significantly improves localization accuracy while substantially reducing data collection costs for retail deployment.
📝 Abstract
Accurate camera localization is crucial for modern retail environments, enabling enhanced customer experiences, streamlined inventory management, and autonomous operations. While Absolute Pose Regression (APR) from a single image offers a promising solution, approaches that incorporate visual and spatial scene priors tend to achieve higher accuracy. Camera Pose Auto-Encoders (PAEs) have recently been introduced to embed such priors into APR. In this work, we extend PAEs to the task of Relative Pose Regression (RPR) and propose a novel re-localization scheme that refines APR predictions using PAE-based RPR, without requiring additional storage of images or pose data. We first introduce PAE-based RPR and establish its effectiveness by comparing it with image-based RPR models of equivalent architectures. We then demonstrate that our refinement strategy, driven by a PAE-based RPR, enhances APR localization accuracy on indoor benchmarks. Notably, our method is shown to achieve competitive performance even when trained with only 30% of the data, substantially reducing the data collection burden for retail deployment. Our code and pre-trained models are available at: https://github.com/yolish/camera-pose-auto-encoders