G-DRAGON: Geospatial Reasoning and Dynamic Planning for Retrieval-Augmented Outdoor Navigation

πŸ“… 2026-05-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing vision-language navigation methods in long-range geospatial reasoning and fine-grained β€œlast-mile” exploration. The authors propose a retrieval-augmented outdoor navigation framework that integrates generative retrieval, versioned OpenStreetMap geographic entities, and an open-vocabulary semantic voxel map. A lightweight large language model maps natural language instructions to geographic entities to generate a global path, which is then combined with SLAM and frontier-based exploration for end-to-end navigation. By grounding language in structured geospatial data, the approach mitigates hallucination issues commonly associated with cloud-based large models. The method significantly outperforms prior approaches in simulation and successfully completes a 500-meter autonomous person-search task in real urban environments.
πŸ“ Abstract
Autonomous ground robots operating in large-scale outdoor environments require both robust long-range navigation and fine-grained ''last-mile'' exploration. Current advances in visual-language navigation (VLN) work well at short-range tasks, lacking geospatial grounding for long-distance missions. Some OpenStreetMap (OSM)-based methods relying on cloud-based Large Language Models (LLMs) are prone to factual hallucination and cannot conduct ''last-mile'' exploration based on human instruction. To address these challenges, we present G-DRAGON, a retrieval-augmented framework for outdoor, open-world navigation. This framework maps natural-language commands to versioned, local OSM entities via generative retrieval based on lightweight LLM, yielding accurate coordinates for global route planning. A high-level planning module bridges global topological routes with the SLAM system, projecting geospatial waypoints into the robot's navigable frame. For the ''last mile," the framework transitions to frontier-based exploration and open-set semantic voxel mapping to localize open-vocabulary targets. Experimental results in simulation demonstrate our framework outperforms state-of-the-art baselines. Furthermore, we validate the system in unseen real-world urban environments on an Unmanned Ground Vehicle (UGV), successfully completing person-search missions with trajectories of up to 500m.
Problem

Research questions and friction points this paper is trying to address.

outdoor navigation
geospatial reasoning
last-mile exploration
visual-language navigation
retrieval-augmented
Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented navigation
geospatial reasoning
lightweight LLM
open-set semantic mapping
frontier-based exploration
πŸ”Ž Similar Papers
No similar papers found.
D
Dongzhihan Wang
School of Future Technology, Shanghai University, Shanghai 200444, China; Spatial AI & Robotics Lab, Institute for Artificial Intelligence and Data Science, Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY 14260, USA
Yi Du
Yi Du
Chinese Academy of Sciences
data miningknowledge engineeringAI for Science
J
Jianan Sun
College of Information Science and Technology, Donghua University, Shanghai 200051, China
Y
Yuan Xue
School of Future Technology, Shanghai University, Shanghai 200444, China
Y
Yingchen Zhang
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing 100190, China
B
Bing Xiao
School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Chen Wang
Chen Wang
Assistant Professor, Spatial AI & Robotics Lab, University at Buffalo
Spatial AIRobotics
Liang Xu
Liang Xu
Shanghai University
Networked Control SystemsLearning and Control