π€ AI Summary
This work addresses the limitations of existing vision-language navigation methods in long-range geospatial reasoning and fine-grained βlast-mileβ exploration. The authors propose a retrieval-augmented outdoor navigation framework that integrates generative retrieval, versioned OpenStreetMap geographic entities, and an open-vocabulary semantic voxel map. A lightweight large language model maps natural language instructions to geographic entities to generate a global path, which is then combined with SLAM and frontier-based exploration for end-to-end navigation. By grounding language in structured geospatial data, the approach mitigates hallucination issues commonly associated with cloud-based large models. The method significantly outperforms prior approaches in simulation and successfully completes a 500-meter autonomous person-search task in real urban environments.
π Abstract
Autonomous ground robots operating in large-scale outdoor environments require both robust long-range navigation and fine-grained ''last-mile'' exploration. Current advances in visual-language navigation (VLN) work well at short-range tasks, lacking geospatial grounding for long-distance missions. Some OpenStreetMap (OSM)-based methods relying on cloud-based Large Language Models (LLMs) are prone to factual hallucination and cannot conduct ''last-mile'' exploration based on human instruction. To address these challenges, we present G-DRAGON, a retrieval-augmented framework for outdoor, open-world navigation. This framework maps natural-language commands to versioned, local OSM entities via generative retrieval based on lightweight LLM, yielding accurate coordinates for global route planning. A high-level planning module bridges global topological routes with the SLAM system, projecting geospatial waypoints into the robot's navigable frame. For the ''last mile," the framework transitions to frontier-based exploration and open-set semantic voxel mapping to localize open-vocabulary targets. Experimental results in simulation demonstrate our framework outperforms state-of-the-art baselines. Furthermore, we validate the system in unseen real-world urban environments on an Unmanned Ground Vehicle (UGV), successfully completing person-search missions with trajectories of up to 500m.