🤖 AI Summary
This paper addresses camera motion estimation from two-view 2D point correspondences. Departing from conventional suboptimal formulations based on the essential matrix and epipolar constraints, it directly formulates maximum-likelihood estimation on the SO(3) × S² manifold for rotation and normalized translation. The proposed algorithm is the first to achieve both statistical consistency and asymptotic efficiency—its mean-square error attains the Cramér–Rao lower bound. It combines noise-variance-adaptive initialization, bias correction, and a single-step Gauss–Newton optimization on the manifold, yielding O(n) time complexity. Evaluated on synthetic and real-world datasets with hundreds of point correspondences, the method outperforms state-of-the-art approaches in both accuracy and CPU runtime. Extensive experiments validate its theoretical guarantees and practical efficacy.
📝 Abstract
Given 2D point correspondences between an image pair, inferring the camera motion is a fundamental issue in the computer vision community. The existing works generally set out from the epipolar constraint and estimate the essential matrix, which is not optimal in the maximum likelihood (ML) sense. In this paper, we dive into the original measurement model with respect to the rotation matrix and normalized translation vector and formulate the ML problem. We then propose a two-step algorithm to solve it: In the first step, we estimate the variance of measurement noises and devise a consistent estimator based on bias elimination; In the second step, we execute a one-step Gauss-Newton iteration on manifold to refine the consistent estimate. We prove that the proposed estimate owns the same asymptotic statistical properties as the ML estimate: The first is consistency, i.e., the estimate converges to the ground truth as the point number increases; The second is asymptotic efficiency, i.e., the mean squared error of the estimate converges to the theoretical lower bound -- Cramer-Rao bound. In addition, we show that our algorithm has linear time complexity. These appealing characteristics endow our estimator with a great advantage in the case of dense point correspondences. Experiments on both synthetic data and real images demonstrate that when the point number reaches the order of hundreds, our estimator outperforms the state-of-the-art ones in terms of estimation accuracy and CPU time.