CL-CoTNav: Closed-Loop Hierarchical Chain-of-Thought for Zero-Shot Object-Goal Navigation with Vision-Language Models

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weak zero-shot generalization of Visual Object Navigation (ObjectNav) to unseen environments and novel object categories stems primarily from the lack of structured reasoning in end-to-end approaches. To address this, we propose a Vision-Language Model (VLM)-driven closed-loop Hierarchical Chain-of-Thought (CoT) framework. It enables dynamic decision-making via adaptive confidence-weighted integration of detection and reasoning modules; introduces a multi-turn question-answering dataset of human demonstrations to support cognition-inspired perception-reasoning co-optimization; and combines hierarchical CoT prompting, VLM fine-tuning, and AI Habitat-based simulation training. Experiments demonstrate substantial improvements over state-of-the-art methods on zero-shot ObjectNav: Success Rate (SR) and Success-weighted by Path Length (SPL) increase by 22.4%. We publicly release our dataset, models, and demonstration videos.

Technology Category

Application Category

📝 Abstract
Visual Object Goal Navigation (ObjectNav) requires a robot to locate a target object in an unseen environment using egocentric observations. However, decision-making policies often struggle to transfer to unseen environments and novel target objects, which is the core generalization problem. Traditional end-to-end learning methods exacerbate this issue, as they rely on memorizing spatial patterns rather than employing structured reasoning, limiting their ability to generalize effectively. In this letter, we introduce Closed-Loop Hierarchical Chain-of-Thought Navigation (CL-CoTNav), a vision-language model (VLM)-driven ObjectNav framework that integrates structured reasoning and closed-loop feedback into navigation decision-making. To enhance generalization, we fine-tune a VLM using multi-turn question-answering (QA) data derived from human demonstration trajectories. This structured dataset enables hierarchical Chain-of-Thought (H-CoT) prompting, systematically extracting compositional knowledge to refine perception and decision-making, inspired by the human cognitive process of locating a target object through iterative reasoning steps. Additionally, we propose a Closed-Loop H-CoT mechanism that incorporates detection and reasoning confidence scores into training. This adaptive weighting strategy guides the model to prioritize high-confidence data pairs, mitigating the impact of noisy inputs and enhancing robustness against hallucinated or incorrect reasoning. Extensive experiments in the AI Habitat environment demonstrate CL-CoTNav's superior generalization to unseen scenes and novel object categories. Our method consistently outperforms state-of-the-art approaches in navigation success rate (SR) and success weighted by path length (SPL) by 22.4%. We release our datasets, models, and supplementary videos on our project page.
Problem

Research questions and friction points this paper is trying to address.

Improves generalization in unseen environments for ObjectNav
Enhances decision-making with structured reasoning and feedback
Mitigates noisy inputs via confidence-based adaptive weighting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed-loop hierarchical Chain-of-Thought navigation framework
Fine-tuned VLM with multi-turn QA data
Adaptive weighting using confidence scores
🔎 Similar Papers
No similar papers found.
Y
Yuxin Cai
School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore
X
Xiangkun He
School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore
Maonan Wang
Maonan Wang
Unknown affiliation
Hongliang Guo
Hongliang Guo
四川大学计算机学院
multi-robot efficient searchstochastic on-time arrivalreliable decision making
W
W. Yau
Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (ASTAR), Singapore
C
Chen Lv
School of Mechanical and Aerospace Engineering, Nanyang Technological University, Singapore