🤖 AI Summary
To address the challenges of cross-floor localization and weak zero-shot semantic understanding in open-vocabulary object-goal navigation (OGN) within multi-floor environments, this paper proposes the first training-free, zero-shot floor-aware navigation framework. Methodologically, it constructs a multi-level semantic map for spatial abstraction and integrates an LLM-driven hierarchical reasoning mechanism—from coarse-grained floor selection to fine-grained frontier-point planning—augmented by zero-shot semantic alignment to enable plug-and-play interpretation of unseen object descriptions. The core contribution is a fully zero-shot approach that achieves cross-floor path planning and open-vocabulary grounding without any parameter tuning or fine-tuning. Extensive evaluation on HM3D and MP3D benchmarks demonstrates significant improvements over prior zero-shot methods. Furthermore, the framework is successfully deployed on a quadrupedal robot, enabling end-to-end object exploration in real-world, previously unseen multi-floor environments.
📝 Abstract
Object-Goal Navigation (OGN) remains challenging in real-world, multi-floor environments and under open-vocabulary object descriptions. We observe that most episodes in widely used benchmarks such as HM3D and MP3D involve multi-floor buildings, with many requiring explicit floor transitions. However, existing methods are often limited to single-floor settings or predefined object categories. To address these limitations, we tackle two key challenges: (1) efficient cross-level planning and (2) zero-shot object-goal navigation (ZS-OGN), where agents must interpret novel object descriptions without prior exposure. We propose ASCENT, a framework that combines a Multi-Floor Spatial Abstraction module for hierarchical semantic mapping and a Coarse-to-Fine Frontier Reasoning module leveraging Large Language Models (LLMs) for context-aware exploration, without requiring additional training on new object semantics or locomotion data. Our method outperforms state-of-the-art ZS-OGN approaches on HM3D and MP3D benchmarks while enabling efficient multi-floor navigation. We further validate its practicality through real-world deployment on a quadruped robot, achieving successful object exploration across unseen floors.