🤖 AI Summary
Existing zero-shot object goal navigation methods struggle to adaptively fuse room- and object-level contextual cues based on target characteristics, leading to inefficient exploration and poor accuracy. This work proposes a large language model (LLM)-based adaptive navigation framework that leverages offline LLMs to extract commonsense knowledge and evaluate the association strength between target objects and room types. It dynamically weights and integrates these two sources of context into a unified semantic value map. Furthermore, an adaptive prioritization mechanism is introduced to balance cue weights according to target ambiguity, combined with multi-view verification and a context-aware exploration strategy. Evaluated in both simulated and real-world environments, the proposed method significantly outperforms existing approaches, achieving state-of-the-art performance in terms of Success Rate (SR) and Success weighted by Path Length (SPL).
📝 Abstract
Zero-shot object-goal navigation (ZSON) is a challenging problem in robotics that requires a comprehensive understanding of both language and visual observations. Contextual cues from rooms and objects are critical, but their relative importance depends on the target: some objects are strongly tied to specific room types, while others are better predicted by nearby co-located objects. Existing methods overlook this distinction, leading to inefficient and inaccurate exploration. We present CLUE, a novel navigation framework that adaptively balances the use of contextual rooms and objects by leveraging commonsense knowledge extracted from an offline large language model (LLM). By estimating a target's association with room types using LLM, the agent prioritizes room cues for predictable objects and object cues for those with weak room associations. Our framework constructs a unified semantic value map that integrates both types of contextual information, adaptively weighted by the target's ambiguity to guide exploration. Combined with multi-viewpoint verification and an exploration strategy informed by contextual cues, CLUE achieves robust and efficient navigation. Extensive experiments in simulation and real-world deployments show that our method consistently outperforms state-of-the-art baselines in both success rate (SR) and success weighted by path length (SPL), demonstrating its effectiveness and practicality for real-world navigation tasks.