Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting

📅 2024-09-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Semantic SLAM faces critical challenges in large-scale, complex environments—including parameter explosion, map redundancy, and poor generalization. To address these, we propose Hierarchical Semantic Gaussian Splatting SLAM, a novel framework for large-scale scenes. Our method introduces a hierarchical, category-aware Gaussian representation fused with large language model (LLM) priors for robust semantic embedding; it further employs a cross-level semantic loss to enable fine-grained 3D semantic labeling across 500+ classes and explicit semantic mapping. The system achieves both compactness and scalability, outperforming existing dense SLAM methods across all key metrics: improved localization accuracy, 2× faster inference speed, semantic rendering at 2000 FPS, and significantly reduced storage and training overhead. Extensive evaluation validates its effectiveness on real-world complex scenes containing over 500 semantic categories.

Technology Category

Application Category

📝 Abstract
We propose Hier-SLAM, a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation, which enables accurate global 3D semantic mapping, scaling-up capability, and explicit semantic label prediction in the 3D world. The parameter usage in semantic SLAM systems increases significantly with the growing complexity of the environment, making it particularly challenging and costly for scene understanding. To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs). We further introduce a novel semantic loss designed to optimize hierarchical semantic information through both inter-level and cross-level optimization. Furthermore, we enhance the whole SLAM system, resulting in improved tracking and mapping performance. Our Hier-SLAM outperforms existing dense SLAM methods in both mapping and tracking accuracy, while achieving a 2x operation speed-up. Additionally, it exhibits competitive performance in rendering semantic segmentation in small synthetic scenes, with significantly reduced storage and training time requirements. Rendering FPS impressively reaches 2,000 with semantic information and 3,000 without it. Most notably, it showcases the capability of handling the complex real-world scene with more than 500 semantic classes, highlighting its valuable scaling-up capability.
Problem

Research questions and friction points this paper is trying to address.

Enhances semantic 3D mapping accuracy
Reduces semantic SLAM parameter usage
Improves SLAM system speed and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical categorical Gaussian Splatting
Compact semantic information encoding
Enhanced SLAM system optimization
🔎 Similar Papers