🤖 AI Summary
This work addresses the challenge of language-driven embodied navigation in functional buildings, where highly homogeneous environments limit performance due to insufficient exploitation of prior spatial knowledge. To overcome this, the authors propose a novel framework that transforms environmental maps into semantic prior maps and integrates them into a hierarchical chain-of-thought prompting template for precise path planning. Additionally, a multi-model collaborative action output mechanism is introduced to jointly handle localization decisions and execution control. This approach, the first to combine semantic prior maps with hierarchical chain-of-thought prompting, achieves substantial performance gains on a newly curated functional building dataset—yielding average improvements of 511% and 1175% over SG-Nav and InstructNav in simulation, and 650% and 400% in real-world environments, respectively.
📝 Abstract
Existing language-driven embodied navigation paradigms face challenges in functional buildings (FBs) with highly similar features, as they lack the ability to effectively utilize priori spatial knowledge. To tackle this issue, we propose a Priori-Map Guided Embodied Navigation (PM-Nav), wherein environmental maps are transformed into navigation-friendly semantic priori-maps, a hierarchical chain-of-thought prompt template with an annotation priori-map is designed to enable precise path planning, and a multi-model collaborative action output mechanism is built to accomplish positioning decisions and execution control for navigation planning. Comprehensive tests using a home-made FB dataset show that the PM-Nav obtains average improvements of 511\% and 1175\%, and 650\% and 400\% over the SG-Nav and the InstructNav in simulation and real-world, respectively. These tremendous boosts elucidate the great potential of using the PM-Nav as a backbone navigation framework for FBs.