VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

In uncalibrated monocular dense RGB SLAM, projective ambiguity causes submap alignment failure, degrading geometric consistency and map completeness—particularly in long sequences where VGGT-based methods are constrained by GPU memory. Method: We propose the first dense monocular SLAM framework formulated on the SL(4) manifold, jointly optimizing globally consistent submap alignment and loop closure constraints within the 15-degree-of-freedom projective transformation space. Unlike conventional similarity transformations (SE(3) × ℝ⁺), our approach explicitly models and eliminates scale and projective ambiguities induced by unknown camera intrinsics. The system integrates VGGT-based feedforward scene reconstruction, incremental submap building, projective-geometric constraints, and loop closure correction—without requiring prior knowledge of camera parameters. Contribution/Results: Our method significantly improves dense map completeness and geometric consistency over long sequences, overcoming VGGT’s practical limitations in processing extended video streams under memory constraints.

Technology Category

Application Category

📝 Abstract

We present VGGT-SLAM, a dense RGB SLAM system constructed by incrementally and globally aligning submaps created from the feed-forward scene reconstruction approach VGGT using only uncalibrated monocular cameras. While related works align submaps using similarity transforms (i.e., translation, rotation, and scale), we show that such approaches are inadequate in the case of uncalibrated cameras. In particular, we revisit the idea of reconstruction ambiguity, where given a set of uncalibrated cameras with no assumption on the camera motion or scene structure, the scene can only be reconstructed up to a 15-degrees-of-freedom projective transformation of the true geometry. This inspires us to recover a consistent scene reconstruction across submaps by optimizing over the SL(4) manifold, thus estimating 15-degrees-of-freedom homography transforms between sequential submaps while accounting for potential loop closure constraints. As verified by extensive experiments, we demonstrate that VGGT-SLAM achieves improved map quality using long video sequences that are infeasible for VGGT due to its high GPU requirements.

Problem

Research questions and friction points this paper is trying to address.

Optimizing dense RGB SLAM on SL(4) manifold for uncalibrated cameras

Addressing reconstruction ambiguity with 15-DOF projective transformations

Improving map quality for long sequences with loop closure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes submap alignment on SL(4) manifold

Estimates 15-DOF homography transforms between submaps

Uses uncalibrated monocular cameras for dense SLAM

🔎 Similar Papers

AirSLAM: An Efficient and Illumination-Robust Point-Line Visual SLAM System