Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the poor generalization and low robustness to execution perturbations in imitation learning for vision-and-language navigation by proposing NavGRPO, a novel framework that introduces Group Relative Policy Optimization. NavGRPO drives policy updates through intra-group trajectory performance comparisons without requiring an auxiliary value network. By leveraging trajectory diversity to encourage exploration, the method reduces reliance on expert demonstrations and significantly enhances robustness in unseen environments and under perturbations. Experimental results demonstrate consistent improvements on the R2R and REVERIE benchmarks, with Success weighted by Path Length (SPL) gains of 3.0% and 1.71%, respectively. Notably, under severe early-stage perturbations, NavGRPO achieves an SPL improvement of up to 14.89%.

Technology Category

Application Category

📝 Abstract

Vision-and-Language Navigation (VLN) requires agents to navigate photo-realistic environments following natural language instructions. Current methods predominantly rely on imitation learning, which suffers from limited generalization and poor robustness to execution perturbations. We present NavGRPO, a reinforcement learning framework that learns goal-directed navigation policies through Group Relative Policy Optimization. By exploring diverse trajectories and optimizing via within-group performance comparisons, our method enables agents to distinguish effective strategies beyond expert paths without requiring additional value networks. Built on ScaleVLN, NavGRPO achieves superior robustness on R2R and REVERIE benchmarks with +3.0% and +1.71% SPL improvements in unseen environments. Under extreme early-stage perturbations, we demonstrate +14.89% SPL gain over the baseline, confirming that goal-directed RL training builds substantially more robust navigation policies. Code and models will be released.

Problem

Research questions and friction points this paper is trying to address.

Vision-and-Language Navigation

imitation learning

robustness

generalization

execution perturbations

Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory diversity

group relative policy optimization

robust vision-and-language navigation