SLAM-Former: Putting SLAM into One Transformer

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the challenge of unifying full SLAM functionality—front-end tracking, incremental mapping, and back-end global optimization—within a single end-to-end learnable architecture. We propose the first neural SLAM framework that integrates all these components into a unified Transformer model. By serializing monocular video streams as spatiotemporal sequences, the model jointly and iteratively optimizes camera poses and dense depth maps, enabling geometrically consistent, tightly coupled reconstruction. Key contributions include: (i) the first holistic integration of complete SLAM pipeline within a single Transformer, eliminating conventional modular design; and (ii) novel mechanisms for incremental feature updating and cross-frame joint pose-depth optimization. Evaluated on multiple standard benchmarks, our method matches or surpasses state-of-the-art dense SLAM approaches in accuracy and robustness, with particularly notable improvements in dynamic scenes and long-duration sequences.

Technology Category

Application Category

📝 Abstract

We present SLAM-Former, a novel neural approach that integrates full SLAM capabilities into a single transformer. Similar to traditional SLAM systems, SLAM-Former comprises both a frontend and a backend that operate in tandem. The frontend processes sequential monocular images in real-time for incremental mapping and tracking, while the backend performs global refinement to ensure a geometrically consistent result. This alternating execution allows the frontend and backend to mutually promote one another, enhancing overall system performance. Comprehensive experimental results demonstrate that SLAM-Former achieves superior or highly competitive performance compared to state-of-the-art dense SLAM methods.

Problem

Research questions and friction points this paper is trying to address.

Integrating full SLAM capabilities into a single transformer model

Enabling real-time incremental mapping and tracking from images

Performing global refinement for geometrically consistent results

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates full SLAM capabilities into one transformer

Frontend processes images for real-time tracking and mapping

Backend performs global refinement for geometric consistency

🔎 Similar Papers

How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey