๐ค AI Summary
This work addresses zero-shot object navigation in complex, multi-room indoor environments. We propose an end-to-end hierarchical navigation framework that requires no pretraining, human intervention, explicit reward engineering, or model fine-tuning. Methodologically, the framework integrates a layout-aware global topological map with a local scene memory representation, leveraging a large language model (LLM) as the core for semantic reasoning and hierarchical control. Our key contribution is the first โtopologyโmemoryโLLMโ collaborative paradigm for zero-shot navigation, balancing generalization capability and deployment efficiency. On the Matterport3D (MP3D) benchmark, our approach achieves 85% success rate (SR) and 79% path-weighted success rate (SPL), surpassing prior state-of-the-art methods by over 40 percentage points in SR and 60% in SPL. Extensive validation is conducted in both simulated agents and real-world robotic platforms.
๐ Abstract
We introduce ELA-ZSON, an efficient layout-aware zero-shot object navigation (ZSON) approach designed for complex multi-room indoor environments. By planning hierarchically leveraging a global topologigal map with layout information and local imperative approach with detailed scene representation memory, ELA-ZSON achieves both efficient and effective navigation. The process is managed by an LLM-powered agent, ensuring seamless effective planning and navigation, without the need for human interaction, complex rewards, or costly training. Our experimental results on the MP3D benchmark achieves 85% object navigation success rate (SR) and 79% success rate weighted by path length (SPL) (over 40% point improvement in SR and 60% improvement in SPL compared to exsisting methods). Furthermore, we validate the robustness of our approach through virtual agent and real-world robotic deployment, showcasing its capability in practical scenarios. See https://anonymous.4open.science/r/ELA-ZSON-C67E/ for details.