🤖 AI Summary
Existing monocular video-based heading estimation methods are prone to failure or high computational cost under significant noise or dynamic disturbances. This work proposes a unit-sphere-based generalized Hough transform framework that leverages feature point pairs to generate great-circle directions and, for the first time, employs Fibonacci lattice points for efficient spherical discretization to enable robust voting. By integrating spherical geometry, a great-circle voting mechanism, and low-discrepancy sampling, the method achieves strong robustness against outliers and dynamic objects while maintaining high computational efficiency. Experiments demonstrate that the approach attains Pareto-optimal performance in both accuracy and efficiency across three benchmark datasets and significantly reduces the RMSE of SLAM pose initialization.
📝 Abstract
Estimating camera motion from monocular video is a fundamental problem in computer vision, central to tasks such as SLAM, visual odometry, and structure-from-motion. Existing methods that recover the camera's heading under known rotation, whether from an IMU or an optimization algorithm, tend to perform well in low-noise, low-outlier conditions, but often decrease in accuracy or become computationally expensive as noise and outlier levels increase. To address these limitations, we propose a novel generalization of the Hough transform on the unit sphere (S(2)) to estimate the camera's heading. First, the method extracts correspondences between two frames and generates a great circle of directions compatible with each pair of correspondences. Then, by discretizing the unit sphere using a Fibonacci lattice as bin centers, each great circle casts votes for a range of directions, ensuring that features unaffected by noise or dynamic objects vote consistently for the correct motion direction. Experimental results on three datasets demonstrate that the proposed method is on the Pareto frontier of accuracy versus efficiency. Additionally, experiments on SLAM show that the proposed method reduces RMSE by correcting the heading during camera pose initialization.