🤖 AI Summary
To address the challenges of dynamic environment understanding and adaptation in long-term robotic operation, this paper proposes a real-time, high-fidelity 3D semantic scene graph construction method. Our approach hierarchically integrates 3D Gaussian splatting with open-vocabulary vision foundation models (e.g., SAM, CLIP) to build a layered, dynamic scene graph. We introduce a hierarchical graph neural representation coupled with an incremental joint feature loss to jointly optimize semantic and geometric consistency. Furthermore, we establish a real-environment-driven semantic–geometric alignment update mechanism, enabling synchronized, online incremental updates of both the semantic graph and the Gaussian map. Experiments demonstrate significant improvements over baselines: +19.3% mAP@10 in language-guided object retrieval, superior semantic segmentation accuracy, and high-quality reconstruction (PSNR = 32.7 dB). The method achieves stable, minute-level dynamic updates in real laboratory environments.
📝 Abstract
In real-world scenarios, the environment changes caused by agents or human activities make it extremely challenging for robots to perform various long-term tasks. To effectively understand and adapt to dynamic environments, the perception system of a robot needs to extract instance-level semantic information, reconstruct the environment in a fine-grained manner, and update its environment representation in memory according to environment changes. To address these challenges, We propose extbf{DynamicGSG}, a dynamic, high-fidelity, open-vocabulary scene graph generation system leveraging Gaussian splatting. Our system comprises three key components: (1) constructing hierarchical scene graphs using advanced vision foundation models to represent the spatial and semantic relationships of objects in the environment, (2) designing a joint feature loss to optimize the Gaussian map for incremental high-fidelity reconstruction, and (3) updating the Gaussian map and scene graph according to real environment changes for long-term environment adaptation. Experiments and ablation studies demonstrate the performance and efficacy of the proposed method in terms of semantic segmentation, language-guided object retrieval, and reconstruction quality. Furthermore, we have validated the dynamic updating capabilities of our system in real laboratory environments. The source code will be released at:~href{https://github.com/GeLuzhou/Dynamic-GSG}{https://github.com/GeLuzhou/DynamicGSG}.