vS-Graphs: Integrating Visual SLAM and Situational Graphs through Multi-level Scene Understanding

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing visual SLAM systems struggle to construct 3D maps that are semantically rich, structurally coherent, and human-interpretable—particularly lacking joint modeling of architectural hierarchies (e.g., rooms, corridors) and object-level semantics. To address this, we propose vS-Graphs, the first end-to-end framework for learning semantic-geometric 3D scene graphs directly from visual features. Our method employs multi-level visual scene parsing to extract structured semantics, elevates geometric primitives (e.g., walls, floors) into functional regions, and integrates them into the SLAM backend for joint semantic-geometric bundle adjustment. Evaluated under pure visual conditions, vS-Graphs achieves semantic entity detection accuracy comparable to LiDAR-based methods. On real-world sequences, it reduces absolute trajectory error by an average of 3.38% (up to 9.58%) over baseline VSLAM. The resulting maps exhibit significantly enhanced semantic fidelity, interpretability, and localization robustness.

Technology Category

Application Category

📝 Abstract

Current Visual Simultaneous Localization and Mapping (VSLAM) systems often struggle to create maps that are both semantically rich and easily interpretable. While incorporating semantic scene knowledge aids in building richer maps with contextual associations among mapped objects, representing them in structured formats like scene graphs has not been widely addressed, encountering complex map comprehension and limited scalability. This paper introduces visual S-Graphs (vS-Graphs), a novel real-time VSLAM framework that integrates vision-based scene understanding with map reconstruction and comprehensible graph-based representation. The framework infers structural elements (i.e., rooms and corridors) from detected building components (i.e., walls and ground surfaces) and incorporates them into optimizable 3D scene graphs. This solution enhances the reconstructed map's semantic richness, comprehensibility, and localization accuracy. Extensive experiments on standard benchmarks and real-world datasets demonstrate that vS-Graphs outperforms state-of-the-art VSLAM methods, reducing trajectory error by an average of 3.38% and up to 9.58% on real-world data. Furthermore, the proposed framework achieves environment-driven semantic entity detection accuracy comparable to precise LiDAR-based frameworks using only visual features. A web page containing more media and evaluation outcomes is available on https://snt-arg.github.io/vsgraphs-results/.

Problem

Research questions and friction points this paper is trying to address.

Enhances semantic richness and interpretability of VSLAM maps.

Integrates visual scene understanding with graph-based map representation.

Improves localization accuracy and reduces trajectory error significantly.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates visual SLAM with scene graphs

Enhances semantic richness and localization accuracy

Uses visual features for semantic entity detection

🔎 Similar Papers

No similar papers found.