Keep It CALM: Toward Calibration-Free Kilometer-Level SLAM with Visual Geometry Foundation Models via an Assistant Eye

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the limitations of existing visual geometry foundation model (VGFM)-based SLAM methods, which suffer from trajectory drift and map divergence in kilometer-scale environments due to their reliance on linear subgraph alignment that fails to handle nonlinear geometric distortions. To overcome these challenges, we propose CAL2M, a novel framework featuring an “auxiliary eye” mechanism to resolve scale ambiguity, an epipolar geometry-guided online method for intrinsic parameter and pose refinement, and an anchor propagation strategy enabling globally consistent nonlinear elastic subgraph alignment. Notably, CAL2M operates without requiring camera calibration and supports plug-and-play integration with any VGFM. Extensive experiments demonstrate that our approach significantly suppresses drift and achieves high-precision, globally consistent SLAM reconstruction in large-scale scenes.

Technology Category

Application Category

📝 Abstract

Visual Geometry Foundation Models (VGFMs) demonstrate remarkable zero-shot capabilities in local reconstruction. However, deploying them for kilometer-level Simultaneous Localization and Mapping (SLAM) remains challenging. In such scenarios, current approaches mainly rely on linear transforms (e.g., Sim3 and SL4) for sub-map alignment, while we argue that a single linear transform is fundamentally insufficient to model the complex, non-linear geometric distortions inherent in VGFM outputs. Forcing such rigid alignment leads to the rapid accumulation of uncorrected residuals, eventually resulting in significant trajectory drift and map divergence. To address these limitations, we present CAL2M (Calibration-free Assistant-eye based Large-scale Localization and Mapping), a plug-and-play framework compatible with arbitrary VGFMs. Distinct from traditional systems, CAL2M introduces an "assistant eye" solely to leverage the prior of constant physical spacing, effectively eliminating scale ambiguity without any temporal or spatial pre-calibration. Furthermore, leveraging the assumption of accurate feature matching, we propose an epipolar-guided intrinsic and pose correction model. Supported by an online intrinsic search module, it can effectively rectify rotation and translation errors caused by inaccurate intrinsics through fundamental matrix decomposition. Finally, to ensure accurate mapping, we introduce a globally consistent mapping strategy based on anchor propagation. By constructing and fusing anchors across the trajectory, we establish a direct local-to-global mapping relationship. This enables the application of nonlinear transformations to elastically align sub-maps, effectively eliminating geometric misalignments and ensuring a globally consistent reconstruction. The source code of CAL2M will be publicly available at https://github.com/IRMVLab/CALM.

Problem

Research questions and friction points this paper is trying to address.

SLAM

Visual Geometry Foundation Models

non-linear distortion

trajectory drift

map divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibration-free SLAM

Visual Geometry Foundation Models

Assistant Eye