MASt3R-Fusion: Integrating Feed-Forward Visual Model with IMU, GNSS for High-Functionality SLAM

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual SLAM suffers from degraded accuracy, scale ambiguity, and global inconsistency in challenging environments such as low-texture and low-illumination scenes. To address these issues, this paper proposes a tightly coupled multi-sensor framework that jointly fuses feedforward neural-network-driven point-cloud geometric regression, IMU measurements, and GNSS observations. Crucially, Sim(3) visual alignment constraints are integrated into an SE(3) factor graph, enabling hierarchical optimization over both sliding windows and global loop closures. This work is the first to co-optimize deep geometric priors with heterogeneous sensor measurements under a unified Hessian-form constraint formulation, significantly improving scale consistency and global mapping accuracy. Evaluated on public and custom-collected datasets, the system demonstrates superior accuracy, robustness, and consistency compared to state-of-the-art vision-centric multi-sensor SLAM approaches. The source code will be made publicly available.

Technology Category

Application Category

📝 Abstract
Visual SLAM is a cornerstone technique in robotics, autonomous driving and extended reality (XR), yet classical systems often struggle with low-texture environments, scale ambiguity, and degraded performance under challenging visual conditions. Recent advancements in feed-forward neural network-based pointmap regression have demonstrated the potential to recover high-fidelity 3D scene geometry directly from images, leveraging learned spatial priors to overcome limitations of traditional multi-view geometry methods. However, the widely validated advantages of probabilistic multi-sensor information fusion are often discarded in these pipelines. In this work, we propose MASt3R-Fusion,a multi-sensor-assisted visual SLAM framework that tightly integrates feed-forward pointmap regression with complementary sensor information, including inertial measurements and GNSS data. The system introduces Sim(3)-based visualalignment constraints (in the Hessian form) into a universal metric-scale SE(3) factor graph for effective information fusion. A hierarchical factor graph design is developed, which allows both real-time sliding-window optimization and global optimization with aggressive loop closures, enabling real-time pose tracking, metric-scale structure perception and globally consistent mapping. We evaluate our approach on both public benchmarks and self-collected datasets, demonstrating substantial improvements in accuracy and robustness over existing visual-centered multi-sensor SLAM systems. The code will be released open-source to support reproducibility and further research (https://github.com/GREAT-WHU/MASt3R-Fusion).
Problem

Research questions and friction points this paper is trying to address.

Overcoming visual SLAM limitations in low-texture environments and scale ambiguity
Integrating neural pointmap regression with IMU and GNSS sensor fusion
Enabling real-time metric-scale perception with globally consistent mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates feed-forward visual model with IMU and GNSS sensors
Uses Sim(3)-based visual alignment constraints in factor graph
Hierarchical factor graph enables real-time and global optimization
🔎 Similar Papers
No similar papers found.
Y
Yuxuan Zhou
School of Geodesy and Geomatics, Wuhan University, China
Xingxing Li
Xingxing Li
GFZ
GPSGNSS precise positioning and orbit determinationGNSS data processingGNSS seismologyGNSS meteorology
S
Shengyu Li
School of Geodesy and Geomatics, Wuhan University, China
Z
Zhuohao Yan
School of Geodesy and Geomatics, Wuhan University, China
C
Chunxi Xia
School of Geodesy and Geomatics, Wuhan University, China
S
Shaoquan Feng
School of Geodesy and Geomatics, Wuhan University, China