Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

📅 2024-12-13

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Zero-shot vision-language navigation in continuous environments (VLN-CE) faces three core challenges: absence of expert demonstrations, weak environmental priors, and a continuous action space. Method: We propose a constraint-aware sub-instruction sequence modeling framework. It introduces, for the first time, a constraint-driven dynamic sub-instruction decomposition and switching mechanism, coupled with a superpixel-guided online refinement of value maps to enable real-time value estimation and robust decision-making. Contribution/Results: Our approach overcomes the dual limitations of trajectory scarcity and structural prior deficiency inherent in zero-shot settings. On the R2R-CE and RxR-CE unseen test sets, it achieves state-of-the-art success rates—improving over prior work by 12% and 13%, respectively. The method has been successfully deployed on multiple real-world indoor robotic platforms across diverse scenarios.

Technology Category

Application Category

📝 Abstract

We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint-aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM's constraints, generates a value map on the fly and refines it using superpixel clustering to improve navigation stability. CA-Nav achieves the state-of-the-art performance on two VLN-CE benchmarks, surpassing the previous best method by 12 percent and 13 percent in Success Rate on the validation unseen splits of R2R-CE and RxR-CE, respectively. Moreover, CA-Nav demonstrates its effectiveness in real-world robot deployments across various indoor scenes and instructions.

Problem

Research questions and friction points this paper is trying to address.

Zero-shot Vision-Language Navigation without expert training data

Continuous environment navigation with minimal structural prior knowledge

Constraint-aware sub-instruction completion for stable navigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constraint-Aware Sub-instruction Manager tracks progress

Constraint-Aware Value Mapper refines navigation stability

Sequential constraint-aware sub-instruction completion process

🔎 Similar Papers

Open-Nav: Exploring Zero-Shot Vision-and-Language Navigation in Continuous Environment with Open-Source LLMs