🤖 AI Summary
Existing monocular visual-inertial odometry (VIO) initialization methods suffer from low accuracy, high computational cost, and performance degradation due to decoupled rotation/translation estimation and linear constraints in structure-free approaches. To address these issues, this paper proposes a structure-agnostic visual-inertial joint optimization initialization method that does not require 3D point cloud reconstruction. We introduce the first “structureless” visual-inertial bundle adjustment framework, which jointly optimizes camera poses and IMU states—including biases, scale factor, and gravity direction—via nonlinear least squares, thereby overcoming the limitations of conventional decoupled estimation. Integrated with IMU preintegration and real-time feature tracking, our method enables millisecond-level online optimization. Evaluated on real-world datasets, it achieves an average reduction of 42% in rotation error and 38% in translation error compared to state-of-the-art baselines, significantly improving initialization consistency and accuracy.
📝 Abstract
Monocular visual inertial odometry (VIO) has facilitated a wide range of real-time motion tracking applications, thanks to the small size of the sensor suite and low power consumption. To successfully bootstrap VIO algorithms, the initialization module is extremely important. Most initialization methods rely on the reconstruction of 3D visual point clouds. These methods suffer from high computational cost as state vector contains both motion states and 3D feature points. To address this issue, some researchers recently proposed a structureless initialization method, which can solve the initial state without recovering 3D structure. However, this method potentially compromises performance due to the decoupled estimation of rotation and translation, as well as linear constraints. To improve its accuracy, we propose novel structureless visual-inertial bundle adjustment to further refine previous structureless solution. Extensive experiments on real-world datasets show our method significantly improves the VIO initialization accuracy, while maintaining real-time performance.