LCGNav: Local Candidate-Aware Geometric Enhancement for General Topological Planning in Vision-Language Navigation

πŸ“… 2026-05-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

218K/year
πŸ€– AI Summary
This work addresses the limitations of existing online topological planning methods in vision-and-language navigation, which are susceptible to redundant local depth information and suffer from diminished focus on frontier candidates as the graph expands. To overcome these issues, the authors propose LCGNavβ€”a modular, locally geometry-enhanced framework that transforms candidate-view depth maps into 3D point clouds and applies physical truncation based on the agent’s reachable space to construct compact geometric representations. A dimension-preserving local fusion strategy selectively augments only the currently relevant nodes without altering the original planner interface. The approach introduces two key innovations: candidate-aware geometric modeling and a transient state degradation mechanism. Evaluated on the R2R-CE and RxR-CE benchmarks, LCGNav significantly improves multiple strong baselines, achieving state-of-the-art performance on the val-unseen split when integrated with ETP-R1.
πŸ“ Abstract
Online topological planning has become an effective paradigm for Vision-Language Navigation in Continuous Environments (VLN-CE), but existing methods still suffer from two limitations: redundant local depth information and weakened focus on current frontier candidates as the topological graph grows. To address this, we propose LCGNav, a modular local geometric enhancement framework for topological VLN. LCGNav explicitly converts candidate depth views into 3D point clouds and applies physical truncation based on the agent's reachable range, enabling more compact local geometric modeling. It further introduces a dimension-preserving local fusion strategy with transient state degradation, so that geometric enhancement is applied only to the currently relevant ghost nodes without changing the original planner interface. Experiments on R2R-CE and RxR-CE show that LCGNav serves as an effective cross-architecture enhancement module, consistently improving multiple key metrics of representative online topological baselines with low additional training cost. When integrated with ETP-R1, LCGNav achieves the best performance among the compared online topological methods on the val-unseen splits of the R2R-CE and RxR-CE benchmarks. The code is available at https://github.com/shannanshouyin/LCGNav.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language Navigation
Topological Planning
Local Geometric Enhancement
Frontier Candidates
Depth Redundancy
Innovation

Methods, ideas, or system contributions that make the work stand out.

topological planning
geometric enhancement
3D point clouds
vision-language navigation
local fusion
J
Jiankun Peng
The Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; The School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
Jianyuan Guo
Jianyuan Guo
City University of Hong Kong (CityU)
Y
Yiguang Yang
The Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; The School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
Yue Liu
Yue Liu
Research Fellow, School of Computing, Australian National University
Responsible AIAI engineeringagent architectureSE4AIAI4HSE
J
Jiashuang Yan
The Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; The School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
Y
Ying Xu
The Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China