All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the catastrophic forgetting problem faced by vision-and-language navigation (VLN) agents when fine-tuned across multiple environments in long-term deployment. To mitigate this issue, the authors propose Tucker Adaptation (TuKA), a novel approach that introduces high-order tensors and Tucker decomposition into VLN for the first time. TuKA models multi-level navigation knowledge as a tensor and decomposes it into a shared subspace and scene-specific experts, enabling parameter-efficient decoupled incremental learning. Built upon this framework, the resulting continual learning agent, AlldayWalker, demonstrates superior performance over existing methods in lifelong multi-scenario VLN tasks, effectively achieving robust, all-day, cross-environment navigation capabilities.

Technology Category

Application Category

📝 Abstract
Deploying vision-and-language navigation (VLN) agents requires adaptation across diverse scenes and environments, but fine-tuning on a specific scenario often causes catastrophic forgetting in others, which severely limits flexible long-term deployment. We formalize this challenge as the all-day multi-scenes lifelong VLN (AML-VLN) problem. Existing parameter-efficient adapters (e.g., LoRA and its variants) are limited by their two-dimensional matrix form, which fails to capture the multi-hierarchical navigation knowledge spanning multiple scenes and environments. To address this, we propose Tucker Adaptation (TuKA), which represents the multi-hierarchical navigation knowledge as a high-order tensor and leverages Tucker decomposition to decouple the knowledge into shared subspaces and scenario-specific experts. We further introduce a decoupled knowledge incremental learning strategy to consolidate shared subspaces while constraining specific experts for decoupled lifelong learning. Building on TuKA, we also develop a VLN agent named AlldayWalker, which continually learns across multiple navigation scenarios, achieving all-day multi-scenes navigation. Extensive experiments show that AlldayWalker consistently outperforms state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

vision-and-language navigation
lifelong learning
catastrophic forgetting
multi-scene adaptation
all-day navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tucker Adaptation
lifelong learning
vision-and-language navigation
parameter-efficient adaptation
tensor decomposition
🔎 Similar Papers
No similar papers found.
X
Xudong Wang
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences
G
Gan Li
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences
Z
Zhiyu Liu
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences
Yao Wang
Yao Wang
Xi'an Jiaotong University
Machine LearningSignal ProcessingOperations ManagementNonconvex Optimization
Lianqing Liu
Lianqing Liu
Professor, Shenyang Institute of Automation, Chinese Academy of Sciences
Biosyncretic RobotMicro/Nano RoboticsIntelligent Machine
Zhi Han
Zhi Han
SIA, CAS
Computer Vision