🤖 AI Summary
This work proposes a stereo point-line fused visual-inertial odometry (VIO) system to address the instability of conventional VIO in low-texture and abrupt illumination scenarios, where sparse point features hinder reliable tracking. The method introduces a novel, training-free line segment depth descriptor and leverages entropy-regularized optimal transport to achieve globally consistent and robust line matching—eliminating reliance on point-feature guidance. Furthermore, an adaptive uncertainty weighting mechanism incorporating line constraints is integrated into the factor graph optimization to dynamically fuse IMU measurements with both point and line observations. Experimental results demonstrate that the proposed approach significantly outperforms state-of-the-art baselines on EuRoC, UMA-VI, and real-world challenging sequences, achieving substantial improvements in pose estimation accuracy and robustness while maintaining real-time performance.
📝 Abstract
Robust stereo visual-inertial odometry (VIO) remains challenging in low-texture scenes and under abrupt illumination changes, where point features become sparse and unstable, leading to ambiguous association and under-constrained estimation. Line structures offer complementary geometric cues, yet many efficient point-line systems still rely on point-guided line association, which can break down when point support is weak and may lead to biased constraints. We present a stereo point-line VIO system in which line segments are equipped with dedicated deep descriptors and matched using an entropy-regularized optimal transport formulation, enabling globally consistent correspondences under ambiguity, outliers, and partial observations. The proposed descriptor is training-free and is computed by sampling and pooling network feature maps. To improve estimation stability, we analyze the impact of line measurement noise and introduce reliability-adaptive weighting to regulate the influence of line constraints during optimization. Experiments on EuRoC and UMA-VI, together with real-world deployments in low-texture and illumination-challenging environments, demonstrate improved accuracy and robustness over representative baselines while maintaining real-time performance.