AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address temporal inconsistency, perception-action misalignment, and computational redundancy caused by fixed-step inference in Vision-and-Language Navigation (VLN), this paper proposes AdaNav, an uncertainty-driven adaptive inference framework. Methodologically, it introduces a lightweight Uncertainty-Aware Routing (UAR) module that quantifies policy uncertainty via action entropy; designs a dynamic inference triggering mechanism for difficulty-aware sparse decision-making; and adopts a progressive training paradigm combining heuristic path simulation with reinforcement learning fine-tuning. Evaluated with only 6K training samples, AdaNav achieves +20% Success Rate on R2R val-unseen, +11.7% on RxR-CE, and +11.4% navigation accuracy in real-world scenarios—significantly outperforming proprietary models trained on million-scale datasets. The framework demonstrates that uncertainty-guided adaptive inference enables highly efficient and robust VLN without requiring massive annotated data.

Technology Category

Application Category

📝 Abstract
Vision Language Navigation (VLN) requires agents to follow natural language instructions by grounding them in sequential visual observations over long horizons. Explicit reasoning could enhance temporal consistency and perception action alignment, but reasoning at fixed steps often leads to suboptimal performance and unnecessary computation. To address this, we propose AdaNav, an uncertainty-based adaptive reasoning framework for VLN. At its core is the Uncertainty Adaptive Reasoning Block (UAR), a lightweight plugin that dynamically triggers reasoning. We introduce Action Entropy as a policy prior for UAR and progressively refine it through a Heuristics to RL training method, enabling agents to learn difficulty aware reasoning policies under the strict data limitations of embodied tasks. Results show that with only 6K training samples, AdaNav achieves substantial gains over closed source models trained on million scale data, improving success rate by 20% on R2R val-unseen, 11.7% on RxR-CE, and 11.4% in real world scenes. The code is available at https://github.com/xinding-sys/AdaNav.
Problem

Research questions and friction points this paper is trying to address.

Adaptive reasoning for vision-language navigation under uncertainty
Dynamic trigger of lightweight reasoning blocks using action entropy
Improving navigation success rates with limited training data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-based adaptive reasoning framework for VLN
Dynamic reasoning triggered by lightweight plugin UAR
Heuristics to RL training refines action entropy policy
🔎 Similar Papers
No similar papers found.
X
Xin Ding
University of Science and Technology of China
Jianyu Wei
Jianyu Wei
USTC & MSRA Joint PhD
LLM InfraInference SystemQuantizationKernelCo-design
Y
Yifan Yang
Microsoft Research
S
Shiqi Jiang
Microsoft Research
Qianxi Zhang
Qianxi Zhang
MSRA
database
H
Hao Wu
Nanjing University
F
Fucheng Jia
Central South University
L
Liang Mi
Institute for AI Industry Research (AIR), Tsinghua University
Y
Yuxuan Yan
Zhejiang University
Weijun Wang
Weijun Wang
Tsinghua University
LLM Serving SystemEdge AIVideo Analytics System
Yunxin Liu
Yunxin Liu
IEEE Fellow, Guoqiang Professor, Institute for AI Industry Research (AIR), Tsinghua University
Mobile ComputingEdge ComputingAIoTSystemNetworking
Z
Zhibo Chen
University of Science and Technology of China
T
Ting Cao
Institute for AI Industry Research (AIR), Tsinghua University