🤖 AI Summary
This work addresses the limitations of existing zero-shot object navigation methods, which rely on a single-floor planar assumption and struggle with multi-level and vertically overlapping structures in real buildings. The authors propose TravExplorer, a novel framework that enables open-vocabulary, cross-floor object navigation without requiring prior maps. TravExplorer jointly models traversable surfaces and navigability within a unified voxel map and introduces three key components: traversability-aware 3D frontier extraction, field-of-view (FOV)-aware active exploration, and foothold-guided 3D motion generation. Experiments demonstrate that TravExplorer significantly outperforms current ObjectNav baselines across 4,195 scenes from HM3D and MP3D. Furthermore, it successfully completes 50 real-world cross-floor, open-vocabulary target searches on a Unitree Go2 robot without any human intervention.
📝 Abstract
Zero-shot Object Navigation (ZSON) has shown promise for open-vocabulary target search in unseen environments, yet most existing systems remain tied to planar representations and single-floor assumptions. These assumptions become inadequate in real buildings, where navigation involves floors, stairs, landings, and vertically overlapping spaces. This article presents TravExplorer, a cross-floor embodied exploration framework that couples zero-shot semantic guidance with traversability-aware 3-D planning. TravExplorer maintains a unified volumetric map that distinguishes occupied structures from robot-reachable support surfaces and extracts traversable frontiers from connected support surfaces, including floors, stairs, and landings. A FOV-aware active perception strategy further resolves incomplete observations during cross-floor traversal. To reduce semantic-reasoning latency, a lightweight guidance module aligns a probabilistic instance map from online open-vocabulary segmentation with a spatial value map from fast image-to-text matching. Based on these geometric and semantic memories, a hierarchical planner performs target-aware frontier touring over object hypotheses, traversable frontiers, and stair landmarks, and generates executable cross-floor motions through foothold-guided 3-D search and vertically constrained local trajectory optimization. Experiments over 4,195 simulated episodes on HM3D and MP3D demonstrate consistent advantages over representative ObjectNav baselines. Fifty real-world trials on a Unitree Go2 further validate open-vocabulary target search across single-floor and cross-floor indoor environments without prior maps or human intervention. The code will be released at https://github.com/wuyi2121/TravExplorer.