GRS-SLAM3R: Real-Time Dense SLAM with Gated Recurrent State

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing end-to-end DUSt3R-style methods estimate local point clouds solely from image pairs, lacking spatial memory and global consistency modeling, thus failing to support incremental, globally consistent metric reconstruction. Method: We propose the first end-to-end dense SLAM framework based on gated recurrent states: a latent state serves as spatial memory, while a Transformer-driven gated update module enables sequential state evolution; combined with subgraph partitioning, local relative geometric constraint modeling, and global registration optimization, the framework ensures cross-frame geometric consistency. The method operates without scene priors or camera calibration, directly producing globally consistent, metrically accurate dense point clouds from RGB sequences in real time. Contribution/Results: Our approach achieves significantly higher reconstruction accuracy than state-of-the-art methods across multiple standard benchmarks, while maintaining real-time performance.

Technology Category

Application Category

📝 Abstract

DUSt3R-based end-to-end scene reconstruction has recently shown promising results in dense visual SLAM. However, most existing methods only use image pairs to estimate pointmaps, overlooking spatial memory and global consistency.To this end, we introduce GRS-SLAM3R, an end-to-end SLAM framework for dense scene reconstruction and pose estimation from RGB images without any prior knowledge of the scene or camera parameters. Unlike existing DUSt3R-based frameworks, which operate on all image pairs and predict per-pair point maps in local coordinate frames, our method supports sequentialized input and incrementally estimates metric-scale point clouds in the global coordinate. In order to improve consistent spatial correlation, we use a latent state for spatial memory and design a transformer-based gated update module to reset and update the spatial memory that continuously aggregates and tracks relevant 3D information across frames. Furthermore, we partition the scene into submaps, apply local alignment within each submap, and register all submaps into a common world frame using relative constraints, producing a globally consistent map. Experiments on various datasets show that our framework achieves superior reconstruction accuracy while maintaining real-time performance.

Problem

Research questions and friction points this paper is trying to address.

Achieving globally consistent dense 3D reconstruction from sequential RGB images

Improving spatial memory and consistency in end-to-end SLAM systems

Maintaining real-time performance while enhancing reconstruction accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential input processing for global metric-scale point clouds

Transformer-based gated module for spatial memory management

Submap partitioning with local alignment for global consistency

🔎 Similar Papers

How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey