A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This work addresses the challenge of balancing reasoning capability and real-time deployment in vision-and-language navigation (VLN) under stringent resource constraints for embodied intelligent systems. The authors propose a deployable hierarchical cognitive architecture that decouples the system into asynchronous modules for real-time perception, memory integration, and high-level reasoning, and introduce a decomposable cognitive memory graph. They innovatively formulate navigation exploration as a Weighted Traveling Repairman Problem (WTRP) to enable context-aware, efficient path planning. By integrating subgraph decomposition-based reasoning with vision-language models, the method significantly improves navigation success rate and efficiency on both simulated and real robotic platforms while maintaining real-time performance on low-power hardware, outperforming existing VLN approaches.

Technology Category

Application Category

📝 Abstract
Bridging the gap between embodied intelligence and embedded deployment remains a key challenge in intelligent robotic systems, where perception, reasoning, and planning must operate under strict constraints on computation, memory, energy, and real-time execution. In vision-language navigation (VLN), existing approaches often face a fundamental trade-off between strong reasoning capabilities and efficient deployment on real-world platforms. In this paper, we present a deployable embodied VLN system that achieves both high efficiency and robust high-level reasoning on real-world robotic platforms. To achieve this, we decouple the system into three asynchronous modules: a real-time perception module for continuous environment sensing, a memory integration module for spatial-semantic aggregation, and a reasoning module for high-level decision making. We incrementally construct a cognitive memory graph to encode scene information, which is further decomposed into subgraphs to enable reasoning with a vision-language model (VLM). To further improve navigation efficiency and accuracy, we also leverage the cognitive memory graph to formulate the exploration problem as a context-aware Weighted Traveling Repairman Problem (WTRP), which minimizes the weighted waiting time of viewpoints. Extensive experiments in both simulation and real-world robotic platforms demonstrate improved navigation success and efficiency over existing VLN approaches, while maintaining real-time performance on resource-constrained hardware.
Problem

Research questions and friction points this paper is trying to address.

embodied intelligence
vision-language navigation
embedded deployment
real-time execution
resource-constrained hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language navigation
cognitive memory graph
context-aware exploration
deployable embodied AI
Weighted Traveling Repairman Problem
Kuan Xu
Kuan Xu
Nanyang Technological University
roboticsvisual SLAM
R
Ruimeng Liu
Center for Advanced Robotics Technology Innovation (CARTIN), School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798
Yizhuo Yang
Yizhuo Yang
Nanyang Technological University
AIRoboticsMulti-modal
D
Denan Liang
Center for Advanced Robotics Technology Innovation (CARTIN), School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798
Tongxing Jin
Tongxing Jin
Nanyang Technological University
Robotics Localization and Mapping
S
Shenghai Yuan
Center for Advanced Robotics Technology Innovation (CARTIN), School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798
Chen Wang
Chen Wang
Assistant Professor, Spatial AI & Robotics Lab, University at Buffalo
Spatial AIRobotics
Lihua Xie
Lihua Xie
Professor of Electrical Engineering, Nanyang Technological University
Robust controlNetworked ControlMult-agent Systems