🤖 AI Summary
The semantic visual SLAM field lacks a systematic survey, particularly regarding the integration of deep learning and large language models (LLMs). To address this gap, we propose a unified problem formulation that decomposes semantic SLAM into five core modules: visual localization, semantic feature extraction, map construction, data association, and loop closure optimization. We introduce a modular analytical framework that unifies classical geometric approaches with modern semantic understanding techniques—including semantic segmentation, object detection, scene understanding, and LLM-based reasoning—and conduct empirical evaluations on benchmark datasets. Our work provides the first comprehensive taxonomy of technical evolution, critically analyzes limitations of existing methods, and identifies key bottlenecks: semantic consistency, cross-modal alignment, and real-time performance. The study establishes an authoritative knowledge base and a scalable technical roadmap for future research in semantic SLAM.
📝 Abstract
Semantic Simultaneous Localization and Mapping (SLAM) is a critical area of research within robotics and computer vision, focusing on the simultaneous localization of robotic systems and associating semantic information to construct the most accurate and complete comprehensive model of the surrounding environment. Since the first foundational work in Semantic SLAM appeared more than two decades ago, this field has received increasing attention across various scientific communities. Despite its significance, the field lacks comprehensive surveys encompassing recent advances and persistent challenges. In response, this study provides a thorough examination of the state-of-the-art of Semantic SLAM techniques, with the aim of illuminating current trends and key obstacles. Beginning with an in-depth exploration of the evolution of visual SLAM, this study outlines its strengths and unique characteristics, while also critically assessing previous survey literature. Subsequently, a unified problem formulation and evaluation of the modular solution framework is proposed, which divides the problem into discrete stages, including visual localization, semantic feature extraction, mapping, data association, and loop closure optimization. Moreover, this study investigates alternative methodologies such as deep learning and the utilization of large language models, alongside a review of relevant research about contemporary SLAM datasets. Concluding with a discussion on potential future research directions, this study serves as a comprehensive resource for researchers seeking to navigate the complex landscape of Semantic SLAM.