MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing

📅 2024-12-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address large pose estimation errors in deep visual odometry caused by ambiguous inter-frame image matching, this paper proposes MambaVO++, an end-to-end framework for sequential matching refinement and training smoothing. Methodologically, it introduces the Geometric Mamba Module (GMM) to explicitly model long-range pixel-wise matching dependencies; incorporates Semi-Dense Geometric Initialization (GIM) and a Point-to-Frame Graph (PFG) for robust initialization and loop-closure enhancement; jointly optimizes poses and map points via differentiable depth-based Bundle Adjustment; and designs a Trend-Aware Penalty (TAP) to harmonize matching and pose losses. Evaluated on standard benchmarks, MambaVO++ achieves state-of-the-art accuracy while enabling real-time inference and exhibiting low GPU memory consumption. The source code will be publicly released.

Technology Category

Application Category

📝 Abstract
Deep visual odometry has demonstrated great advancements by learning-to-optimize technology. This approach heavily relies on the visual matching across frames. However, ambiguous matching in challenging scenarios leads to significant errors in geometric modeling and bundle adjustment optimization, which undermines the accuracy and robustness of pose estimation. To address this challenge, this paper proposes MambaVO, which conducts robust initialization, Mamba-based sequential matching refinement, and smoothed training to enhance the matching quality and improve the pose estimation in deep visual odometry. Specifically, when a new frame is received, it is matched with the closest keyframe in the maintained Point-Frame Graph (PFG) via the semi-dense based Geometric Initialization Module (GIM). Then the initialized PFG is processed by a proposed Geometric Mamba Module (GMM), which exploits the matching features to refine the overall inter-frame pixel-to-pixel matching. The refined PFG is finally processed by deep BA to optimize the poses and the map. To deal with the gradient variance, a Trending-Aware Penalty (TAP) is proposed to smooth training by balancing the pose loss and the matching loss to enhance convergence and stability. A loop closure module is finally applied to enable MambaVO++. On public benchmarks, MambaVO and MambaVO++ demonstrate SOTA accuracy performance, while ensuring real-time running performance with low GPU memory requirement. Codes will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Depth Visual Odometry
Image Matching
Position Calculation Error
Innovation

Methods, ideas, or system contributions that make the work stand out.

Advanced Visual Localization
Improved Image Matching
Enhanced System Performance
🔎 Similar Papers
No similar papers found.
S
Shuo Wang
School of Information, Renmin University of China
W
Wanting Li
School of Information, Renmin University of China
Y
Yongcai Wang
School of Information, Renmin University of China
Z
Zhaoxin Fan
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Institute of Artificial Intelligence, Beihang University, Beijing, China
Z
Zhe Huang
School of Information, Renmin University of China
Xudong Cai
Xudong Cai
Renmin University of China
computer visioncamera localizationSLAM
J
Jian Zhao
Institute of Artificial Intelligence (TeleAI), China Telecom
D
Deying Li
School of Information, Renmin University of China