ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing monocular SLAM systems suffer from dependence on calibrated intrinsic parameters, architectural complexity, and low-quality two-view geometric constraints. To address these issues, this paper proposes a lightweight, real-time, intrinsic-free monocular SLAM framework. Methodologically, the front-end employs a symmetric dual-view association network that jointly estimates relative pose and local point clouds directly from two RGB frames; the back-end performs Sim(3) pose-graph optimization with loop closure to achieve globally consistent, scale-aware mapping. The proposed network has only 35% of the parameters of state-of-the-art methods, significantly improving robustness of two-view constraints and cross-camera generalization. Extensive experiments on multiple standard benchmarks demonstrate superior trajectory accuracy and dense reconstruction quality compared to current best methods, while maintaining real-time performance.

Technology Category

Application Category

📝 Abstract
We present ViSTA-SLAM as a real-time monocular visual SLAM system that operates without requiring camera intrinsics, making it broadly applicable across diverse camera setups. At its core, the system employs a lightweight symmetric two-view association (STA) model as the frontend, which simultaneously estimates relative camera poses and regresses local pointmaps from only two RGB images. This design reduces model complexity significantly, the size of our frontend is only 35% that of comparable state-of-the-art methods, while enhancing the quality of two-view constraints used in the pipeline. In the backend, we construct a specially designed Sim(3) pose graph that incorporates loop closures to address accumulated drift. Extensive experiments demonstrate that our approach achieves superior performance in both camera tracking and dense 3D reconstruction quality compared to current methods. Github repository: https://github.com/zhangganlin/vista-slam
Problem

Research questions and friction points this paper is trying to address.

Monocular visual SLAM without camera intrinsics requirement
Lightweight symmetric two-view association for pose estimation
Sim(3) pose graph with loop closures for drift correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Monocular SLAM without camera intrinsics
Lightweight symmetric two-view association frontend
Sim(3) pose graph with loop closure backend
🔎 Similar Papers
No similar papers found.