SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation in vision-and-language navigation under long-horizon, multi-task language instructions caused by information overload. To this end, we propose SeqWalker, a hierarchical planning framework that dynamically decomposes global instructions into context-aware sub-instructions via a high-level planner. A low-level planner then executes these sub-instructions through an exploration-verification mechanism grounded in visual-linguistic alignment, effectively mitigating cognitive load. Crucially, SeqWalker introduces a novel trajectory correction strategy informed by the logical structure of instructions, substantially enhancing navigation robustness and accuracy in complex multi-task scenarios. Evaluated on an extended IVLN dataset, SeqWalker establishes a new state-of-the-art benchmark, significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
Sequential-Horizon Vision-and-Language Navigation (SH-VLN) presents a challenging scenario where agents should sequentially execute multi-task navigation guided by complex, long-horizon language instructions. Current vision-and-language navigation models exhibit significant performance degradation with such multi-task instructions, as information overload impairs the agent's ability to attend to observationally relevant details. To address this problem, we propose SeqWalker, a navigation model built on a hierarchical planning framework. Our SeqWalker features: i) A High-Level Planner that dynamically selects global instructions into contextually relevant sub-instructions based on the agent's current visual observations, thus reducing cognitive load; ii) A Low-Level Planner incorporating an Exploration-Verification strategy that leverages the inherent logical structure of instructions for trajectory error correction. To evaluate SH-VLN performance, we also extend the IVLN dataset and establish a new benchmark. Extensive experiments are performed to demonstrate the superiority of the proposed SeqWalker.
Problem

Research questions and friction points this paper is trying to address.

Vision-and-Language Navigation
Sequential-Horizon
Multi-task Navigation
Long-horizon Instructions
Information Overload
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Planning
Vision-and-Language Navigation
Sequential-Horizon Navigation
Exploration-Verification Strategy
Instruction Decomposition
🔎 Similar Papers
No similar papers found.
Z
Zebin Han
North University of China
X
Xudong Wang
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences
B
Baichen Liu
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences
Qi Lyu
Qi Lyu
Master of Science, Michigan State University
Deep LearningNLP
Z
Zhenduo Shang
State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences
J
Jiahua Dong
Mohamed bin Zayed University of Artificial Intelligence
Lianqing Liu
Lianqing Liu
Professor, Shenyang Institute of Automation, Chinese Academy of Sciences
Biosyncretic RobotMicro/Nano RoboticsIntelligent Machine
Zhi Han
Zhi Han
SIA, CAS
Computer Vision