🤖 AI Summary
This work addresses the challenges of language-guided navigation in large-scale dynamic outdoor environments—namely, difficulties in semantic reasoning, high environmental dynamism, and poor long-term stability—by proposing a multi-level semantic scene graph navigation framework that integrates offline maps with real-time perception. It pioneers the integration of large language models with embodied scene graphs to construct a temporally updatable dynamic graph structure that explicitly models moving objects and enables multi-granular spatial reasoning and hierarchical planning under open-vocabulary queries. By synergizing retrieval-augmented generation with graph-based reasoning, the method achieves efficient long-horizon semantic navigation. Experiments demonstrate significant improvements in navigation robustness, efficiency, and long-term performance in both simulated and real-world dynamic outdoor settings.
📝 Abstract
Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the Embodied Graph, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the Embodied Graph supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The Embodied Graph is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.