🤖 AI Summary
This work proposes a novel approach to semantic mapping by adopting 3D Semantic Scene Graphs (3DSSGs) as the core representational layer, addressing the inconsistency and limited scalability of existing methods that decouple perception from semantic representation in large-scale real-world environments. By incrementally constructing and updating the graph structure in real time during exploration, the method bridges the gap between raw sensor data and high-level knowledge systems. It leverages incremental scene graph prediction, spatially anchored explicit graph representations, and a unified mechanism for integrating flat and hierarchical topologies. This framework seamlessly incorporates external knowledge sources—such as knowledge graphs, ontologies, and large language models—enabling efficient, consistent, and interpretable open-set semantic maps over long-term, large-scale operation, thereby significantly enhancing agent trustworthiness and alignment with human conceptual understanding in complex environments.
📝 Abstract
While Open Set Semantic Mapping and 3D Semantic Scene Graphs (3DSSGs) are established paradigms in robotic perception, deploying them effectively to support high-level reasoning in large-scale, real-world environments remains a significant challenge. Most existing approaches decouple perception from representation, treating the scene graph as a derivative layer generated post hoc. This limits both consistency and scalability. In contrast, we propose a mapping architecture where the 3DSSG serves as the foundational backend, acting as the primary knowledge representation for the entire mapping process. Our approach leverages prior work on incremental scene graph prediction to infer and update the graph structure in real-time as the environment is explored. This ensures that the map remains topologically consistent and computationally efficient, even during extended operations in large-scale settings. By maintaining an explicit, spatially grounded representation that supports both flat and hierarchical topologies, we bridge the gap between sub-symbolic raw sensor data and high-level symbolic reasoning. Consequently, this provides a stable, verifiable structure that knowledge-driven frameworks, ranging from knowledge graphs and ontologies to Large Language Models (LLMs), can directly exploit, enabling agents to operate with enhanced interpretability, trustworthiness, and alignment to human concepts.