🤖 AI Summary
Monocular depth prediction suffers from inherent scale and offset ambiguities, leading to inaccurate estimation of relative camera pose. To address this, we propose a unified optimization framework that jointly estimates depth scale, depth offset, and camera pose—formulating and solving this previously unaddressed trivariate coupling problem for the first time. For three calibration settings—fully calibrated, shared focal length, and independently estimated focal lengths—we design efficient closed-form analytic solvers that integrate point correspondences and monocular depth constraints. Our approach combines nonlinear least-squares optimization with extended PnP techniques tailored to multiple scenarios. We evaluate the method on synthetic data and two large-scale real-world benchmarks, covering 11 state-of-the-art monocular depth predictors. Results demonstrate superior robustness and achieve state-of-the-art localization accuracy across diverse settings.
📝 Abstract
Recent advances in monocular depth prediction have led to significantly improved depth prediction accuracy. In turn, this enables various applications to use such depth predictions. In this paper, we propose a novel framework for estimating the relative pose between two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale and shift parameter, our solvers jointly estimate both scale and shift parameters together with the camera pose. We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths. Experiments on synthetic and real data, including experiments with depth maps estimated by 11 different depth predictors, show the practical viability of our solvers. Compared to prior work, our solvers achieve state-of-the-art results on two large-scale, real-world datasets. The source code is available at https://github.com/yaqding/pose_monodepth