LEG-SLAM: Real-Time Language-Enhanced Gaussian Splatting for SLAM

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of jointly optimizing semantic understanding and geometric reconstruction in real-time SLAM, where existing methods suffer from poor efficiency and decoupled optimization. We propose the first real-time semantic-enhanced Gaussian Splatting mapping framework for SLAM. Methodologically, we introduce DINOv2 vision-language features into online Gaussian Splatting SLAM for the first time, integrating a learnable PCA compressor to embed semantic features compactly and optimize them jointly with geometry—enabling unified photometric reconstruction and dense semantic mapping. Our approach requires no predefined trajectories or static semantic priors, and outputs high-fidelity rendered images and pixel-level semantic maps end-to-end and synchronously. Evaluated on Replica and ScanNet, it achieves over 10 FPS and 18 FPS, respectively—outperforming state-of-the-art methods in reconstruction speed while maintaining competitive rendering quality.

Technology Category

Application Category

📝 Abstract
Modern Gaussian Splatting methods have proven highly effective for real-time photorealistic rendering of 3D scenes. However, integrating semantic information into this representation remains a significant challenge, especially in maintaining real-time performance for SLAM (Simultaneous Localization and Mapping) applications. In this work, we introduce LEG-SLAM -- a novel approach that fuses an optimized Gaussian Splatting implementation with visual-language feature extraction using DINOv2 followed by a learnable feature compressor based on Principal Component Analysis, while enabling an online dense SLAM. Our method simultaneously generates high-quality photorealistic images and semantically labeled scene maps, achieving real-time scene reconstruction with more than 10 fps on the Replica dataset and 18 fps on ScanNet. Experimental results show that our approach significantly outperforms state-of-the-art methods in reconstruction speed while achieving competitive rendering quality. The proposed system eliminates the need for prior data preparation such as camera's ego motion or pre-computed static semantic maps. With its potential applications in autonomous robotics, augmented reality, and other interactive domains, LEG-SLAM represents a significant step forward in real-time semantic 3D Gaussian-based SLAM. Project page: https://titrom025.github.io/LEG-SLAM/
Problem

Research questions and friction points this paper is trying to address.

Integrating semantic information into Gaussian Splatting for SLAM
Achieving real-time performance in 3D scene reconstruction
Eliminating need for pre-computed data in semantic mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates visual-language features with Gaussian Splatting
Uses PCA-based feature compressor for efficiency
Achieves real-time semantic SLAM without prior data
🔎 Similar Papers
No similar papers found.
R
Roman Titkov
Center for Cognitive Modeling, Moscow Institute of Physics and Technology, Russia
E
Egor Zubkov
Center for Cognitive Modeling, Moscow Institute of Physics and Technology, Russia
D
Dmitry Yudin
Center for Cognitive Modeling, Moscow Institute of Physics and Technology, Russia; AIRI, Moscow, Russia
Jaafar Mahmoud
Jaafar Mahmoud
Ph.D. student, ITMO university
Mobile RoboticsMachine learningSLAM
Malik Mohrat
Malik Mohrat
PhD student, ITMO University
Computer VisionMobile RoboticsMappingML
Gennady Sidorov
Gennady Sidorov
ITMO University
Robotics3D Computer vision