AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address temporal inconsistency, perception-action misalignment, and computational redundancy caused by fixed-step inference in Vision-and-Language Navigation (VLN), this paper proposes AdaNav, an uncertainty-driven adaptive inference framework. Methodologically, it introduces a lightweight Uncertainty-Aware Routing (UAR) module that quantifies policy uncertainty via action entropy; designs a dynamic inference triggering mechanism for difficulty-aware sparse decision-making; and adopts a progressive training paradigm combining heuristic path simulation with reinforcement learning fine-tuning. Evaluated with only 6K training samples, AdaNav achieves +20% Success Rate on R2R val-unseen, +11.7% on RxR-CE, and +11.4% navigation accuracy in real-world scenarios—significantly outperforming proprietary models trained on million-scale datasets. The framework demonstrates that uncertainty-guided adaptive inference enables highly efficient and robust VLN without requiring massive annotated data.

Technology Category

Application Category

📝 Abstract

Vision Language Navigation (VLN) requires agents to follow natural language instructions by grounding them in sequential visual observations over long horizons. Explicit reasoning could enhance temporal consistency and perception action alignment, but reasoning at fixed steps often leads to suboptimal performance and unnecessary computation. To address this, we propose AdaNav, an uncertainty-based adaptive reasoning framework for VLN. At its core is the Uncertainty Adaptive Reasoning Block (UAR), a lightweight plugin that dynamically triggers reasoning. We introduce Action Entropy as a policy prior for UAR and progressively refine it through a Heuristics to RL training method, enabling agents to learn difficulty aware reasoning policies under the strict data limitations of embodied tasks. Results show that with only 6K training samples, AdaNav achieves substantial gains over closed source models trained on million scale data, improving success rate by 20% on R2R val-unseen, 11.7% on RxR-CE, and 11.4% in real world scenes. The code is available at https://github.com/xinding-sys/AdaNav.

Problem

Research questions and friction points this paper is trying to address.

Adaptive reasoning for vision-language navigation under uncertainty

Dynamic trigger of lightweight reasoning blocks using action entropy

Improving navigation success rates with limited training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-based adaptive reasoning framework for VLN

Dynamic reasoning triggered by lightweight plugin UAR

Heuristics to RL training refines action entropy policy

🔎 Similar Papers

No similar papers found.