User-Feedback-Driven Continual Adaptation for Vision-and-Language Navigation

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current general-scenario-adaptive vision-language navigation (GSA-VLN) suffers from the absence of user supervision and overreliance on purely unsupervised environmental exposure. To address this, this paper introduces— for the first time— a systematic integration of user feedback (i.e., instruction corrections and behavioral corrections) into the continual self-adaptation process. We propose a user-feedback-driven online learning framework comprising: (1) feedback modeling and instruction-action signal translation; (2) environment-aligned data generation; (3) memory-augmented continual learning; and (4) a memory bank warm-start mechanism— collectively mitigating cold-start degradation and enhancing re-deployment stability. Evaluated on the GSA-R2R benchmark, our method significantly outperforms strong baselines such as GR-DUET, achieving simultaneous improvements in navigation success rate and path efficiency. Moreover, it demonstrates robust performance gains under both sequential and mixed adaptation settings.

Technology Category

Application Category

📝 Abstract
Vision-and-Language Navigation (VLN) requires agents to navigate complex environments by following natural-language instructions. General Scene Adaptation for VLN (GSA-VLN) shifts the focus from zero-shot generalization to continual, environment-specific adaptation, narrowing the gap between static benchmarks and real-world deployment. However, current GSA-VLN frameworks exclude user feedback, relying solely on unsupervised adaptation from repeated environmental exposure. In practice, user feedback offers natural and valuable supervision that can significantly enhance adaptation quality. We introduce a user-feedback-driven adaptation framework that extends GSA-VLN by systematically integrating human interactions into continual learning. Our approach converts user feedback-navigation instructions and corrective signals-into high-quality, environment-aligned training data, enabling efficient and realistic adaptation. A memory-bank warm-start mechanism further reuses previously acquired environmental knowledge, mitigating cold-start degradation and ensuring stable redeployment. Experiments on the GSA-R2R benchmark show that our method consistently surpasses strong baselines such as GR-DUET, improving navigation success and path efficiency. The memory-bank warm start stabilizes early navigation and reduces performance drops after updates. Results under both continual and hybrid adaptation settings confirm the robustness and generality of our framework, demonstrating sustained improvement across diverse deployment conditions.
Problem

Research questions and friction points this paper is trying to address.

Integrates user feedback into continual adaptation for vision-language navigation
Converts human interactions into environment-aligned training data for realistic adaptation
Uses memory-bank warm-start to stabilize performance and reduce cold-start degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates user feedback into continual adaptation framework
Converts feedback into environment-aligned training data automatically
Uses memory-bank warm-start to reuse prior environmental knowledge
🔎 Similar Papers
No similar papers found.
Y
Yongqiang Yu
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab Emirates
X
Xuhui Li
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab Emirates
H
Hazza Mahmood
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab Emirates
J
Jinxing Zhou
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab Emirates
H
Haodong Hong
School of Electrical Engineering and Computer Science, The University of Queensland, Australia
Longtao Jiang
Longtao Jiang
University of Science and Technology of China
Diffusion modelComputer VisionMultimodal retrieval
Zhiqiang Xu
Zhiqiang Xu
Professor, Academy of Math. And Sys. Sciences, Chinese Academy of Science
approximation theorycompressed sensingsplinesframe theoryquantization
Q
Qi Wu
School of Computer Science, The University of Adelaide, Australia
X
Xiaojun Chang
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab Emirates