Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

πŸ“… 2026-03-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the poor generalization and low robustness to execution perturbations in imitation learning for vision-and-language navigation by proposing NavGRPO, a novel framework that introduces Group Relative Policy Optimization. NavGRPO drives policy updates through intra-group trajectory performance comparisons without requiring an auxiliary value network. By leveraging trajectory diversity to encourage exploration, the method reduces reliance on expert demonstrations and significantly enhances robustness in unseen environments and under perturbations. Experimental results demonstrate consistent improvements on the R2R and REVERIE benchmarks, with Success weighted by Path Length (SPL) gains of 3.0% and 1.71%, respectively. Notably, under severe early-stage perturbations, NavGRPO achieves an SPL improvement of up to 14.89%.

Technology Category

Application Category

πŸ“ Abstract
Vision-and-Language Navigation (VLN) requires agents to navigate photo-realistic environments following natural language instructions. Current methods predominantly rely on imitation learning, which suffers from limited generalization and poor robustness to execution perturbations. We present NavGRPO, a reinforcement learning framework that learns goal-directed navigation policies through Group Relative Policy Optimization. By exploring diverse trajectories and optimizing via within-group performance comparisons, our method enables agents to distinguish effective strategies beyond expert paths without requiring additional value networks. Built on ScaleVLN, NavGRPO achieves superior robustness on R2R and REVERIE benchmarks with +3.0% and +1.71% SPL improvements in unseen environments. Under extreme early-stage perturbations, we demonstrate +14.89% SPL gain over the baseline, confirming that goal-directed RL training builds substantially more robust navigation policies. Code and models will be released.
Problem

Research questions and friction points this paper is trying to address.

Vision-and-Language Navigation
imitation learning
robustness
generalization
execution perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory diversity
group relative policy optimization
robust vision-and-language navigation
reinforcement learning
goal-directed navigation
πŸ”Ž Similar Papers
No similar papers found.
J
Jiangyang Li
Xi’an Jiaotong University
Cong Wan
Cong Wan
Xian Jiaotong University
AIGC3Ddiffusion
S
SongLin Dong
Xi’an Jiaotong University, Faculty of Computility Microelectronics, Shenzhen University of Advanced Technology
Chenhao Ding
Chenhao Ding
Xi'an Jiaotong University
Q
Qiang Wang
Xi’an Jiaotong University
Z
Zhiheng Ma
Faculty of Computility Microelectronics, Shenzhen University of Advanced Technology
Yihong Gong
Yihong Gong
Xi'an Jiaotong University
Multimedia content analysisMachine learningPattern recognition