Learning Goal-Oriented Language-Guided Navigation with Self-Improving Demonstrations at Scale

πŸ“… 2025-09-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Goal-directed vision-and-language navigation (VLN) suffers from insufficient environmental exploration in unseen settings, as existing approaches rely solely on shortest-path trajectories and lack effective exploration priors. This work proposes the Self-Improving Demonstrations (SID) framework: starting from a shortest-path pre-trained agent, SID employs reinforcement learning with rollout-based policies to generate highly exploratory trajectories, iteratively refining demonstration quality to enable cross-task transfer and scalable training. SID eliminates the need for human-annotated demonstrations while substantially enhancing exploration capability and generalization. Evaluated on the REVERIE and SOON benchmarks, SID achieves state-of-the-art performance: on SOON’s unseen partition, it attains a success rate of 50.9%, outperforming the prior best method by 13.9%.

Technology Category

Application Category

πŸ“ Abstract
Goal-oriented language-guided navigation requires robust exploration capabilities for agents to navigate to specified goals in unknown environments without step-by-step instructions. Existing methods tend to exclusively utilize shortest-path trajectories, lacking effective exploration priors for training navigation agents. To address the above challenges, we present SID, a goal-oriented language-guided navigation learning approach with Self-Improving Demonstrations. Specifically, SID learns an initial agent on the shortest-path data sampled from environments and then leverages this agent to generate novel exploration trajectories. The novel rollouts provide demonstrations with stronger exploration strategies to train a better agent, which in turn produces higher-quality agent demonstrations for the next round of training. We show that this iterative self-improving pipeline readily scales to new environments, and the resulting demonstrations can be transferred across a variety of language-guided navigation tasks, elevating the performance ceiling in diverse goal-oriented navigation tasks. Extensive experiments demonstrate that SID significantly boosts the exploration capabilities and generalization of navigation agents. The resulting agent achieves new state-of-the-art performance on goal-oriented language-guided navigation tasks, including REVERIE, SOON, notably achieving a 50.9% success rate on the unseen validation splits of SOON, surpassing the prior leading approaches by a margin of 13.9%.
Problem

Research questions and friction points this paper is trying to address.

Addresses goal-oriented navigation without step-by-step instructions
Overcomes reliance on shortest-path trajectories for training agents
Enhances exploration capabilities and generalization in navigation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-improving demonstrations iteratively enhance exploration strategies
Initial agent trained on shortest-path data generates novel trajectories
Scalable pipeline transfers demonstrations across diverse navigation tasks
πŸ”Ž Similar Papers
S
Songze Li
Shanghai AI Laboratory, Fudan University
Z
Zun Wang
UNC Chapel Hill
Gengze Zhou
Gengze Zhou
The University of Adelaide
Embodied AIMultimodality
J
Jialu Li
UNC Chapel Hill
X
Xiangyu Zeng
Shanghai AI Laboratory, Nanjing University
L
Limin Wang
Shanghai AI Laboratory, Nanjing University
Y
Yu Qiao
Shanghai AI Laboratory
Q
Qi Wu
The University of Adelaide
Mohit Bansal
Mohit Bansal
Parker Distinguished Professor, Computer Science, UNC Chapel Hill
Natural Language ProcessingComputer VisionMachine LearningMultimodal AI
Y
Yi Wang
Shanghai AI Laboratory