Stairway to Success: Zero-Shot Floor-Aware Object-Goal Navigation via LLM-Driven Coarse-to-Fine Exploration

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

To address the challenges of cross-floor localization and weak zero-shot semantic understanding in open-vocabulary object-goal navigation (OGN) within multi-floor environments, this paper proposes the first training-free, zero-shot floor-aware navigation framework. Methodologically, it constructs a multi-level semantic map for spatial abstraction and integrates an LLM-driven hierarchical reasoning mechanism—from coarse-grained floor selection to fine-grained frontier-point planning—augmented by zero-shot semantic alignment to enable plug-and-play interpretation of unseen object descriptions. The core contribution is a fully zero-shot approach that achieves cross-floor path planning and open-vocabulary grounding without any parameter tuning or fine-tuning. Extensive evaluation on HM3D and MP3D benchmarks demonstrates significant improvements over prior zero-shot methods. Furthermore, the framework is successfully deployed on a quadrupedal robot, enabling end-to-end object exploration in real-world, previously unseen multi-floor environments.

Technology Category

Application Category

📝 Abstract

Object-Goal Navigation (OGN) remains challenging in real-world, multi-floor environments and under open-vocabulary object descriptions. We observe that most episodes in widely used benchmarks such as HM3D and MP3D involve multi-floor buildings, with many requiring explicit floor transitions. However, existing methods are often limited to single-floor settings or predefined object categories. To address these limitations, we tackle two key challenges: (1) efficient cross-level planning and (2) zero-shot object-goal navigation (ZS-OGN), where agents must interpret novel object descriptions without prior exposure. We propose ASCENT, a framework that combines a Multi-Floor Spatial Abstraction module for hierarchical semantic mapping and a Coarse-to-Fine Frontier Reasoning module leveraging Large Language Models (LLMs) for context-aware exploration, without requiring additional training on new object semantics or locomotion data. Our method outperforms state-of-the-art ZS-OGN approaches on HM3D and MP3D benchmarks while enabling efficient multi-floor navigation. We further validate its practicality through real-world deployment on a quadruped robot, achieving successful object exploration across unseen floors.

Problem

Research questions and friction points this paper is trying to address.

Efficient cross-level planning in multi-floor environments

Zero-shot object-goal navigation with novel object descriptions

Hierarchical semantic mapping and context-aware exploration without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Floor Spatial Abstraction for hierarchical mapping

Coarse-to-Fine Frontier Reasoning with LLMs

Zero-shot object navigation without additional training

🔎 Similar Papers

Advancing Object Goal Navigation Through LLM-enhanced Object Affinities Transfer