VGGT-SLAM++

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work proposes a high-precision visual SLAM framework to address pose drift in short-term trajectories and the lack of high-frequency local optimization in existing Transformer-based systems. The front-end integrates a Visual Geometry Grounded Transformer (VGGT) with Sim(3) pose estimation, while the back-end introduces a novel fusion of dense digital elevation models (DEMs) and DINOv2 feature embeddings to construct compact subgraphs. High-frequency local bundle adjustment is triggered by visual place recognition (VPR), effectively suppressing short-term drift and accelerating graph optimization convergence. The method achieves state-of-the-art accuracy on standard benchmarks while preserving global consistency in large-scale environments through sublinear-time retrieval.

Technology Category

Application Category

📝 Abstract

We introduce VGGT-SLAM++, a complete visual SLAM system that leverages the geometry-rich outputs of the Visual Geometry Grounded Transformer (VGGT). The system comprises a visual odometry (front-end) fusing the VGGT feed-forward transformer and a Sim(3) solution, a Digital Elevation Map (DEM)-based graph construction module, and a back-end that jointly enable accurate large-scale mapping with bounded memory. While prior transformer-based SLAM pipelines such as VGGT-SLAM rely primarily on sparse loop closures or global Sim(3) manifold constraints - allowing short-horizon pose drift - VGGT-SLAM++ restores high-cadence local bundle adjustment (LBA) through a spatially corrective back-end. For each VGGT submap, we construct a dense planar-canonical DEM, partition it into patches, and compute their DINOv2 embeddings to integrate the submap into a covisibility graph. Spatial neighbors are retrieved using a Visual Place Recognition (VPR) module within the covisibility window, triggering frequent local optimization that stabilizes trajectories. Across standard SLAM benchmarks, VGGT-SLAM++ achieves state-of-the-art accuracy, substantially reducing short-term drift, accelerating graph convergence, and maintaining global consistency with compact DEM tiles and sublinear retrieval.

Problem

Research questions and friction points this paper is trying to address.

visual SLAM

pose drift

large-scale mapping

transformer-based SLAM

local bundle adjustment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual SLAM

Local Bundle Adjustment

Digital Elevation Map