🤖 AI Summary
To address the stringent requirements for adaptability and safety of autonomous navigation in complex, dynamic planetary environments, this paper proposes a multimodal autonomous navigation framework leveraging Vision-Language Models (VLMs). The method employs VLM-driven terrain semantic understanding to dynamically assess terrain complexity and autonomously switch among three perception–mapping–planning modes: lightweight, robust, and high-fidelity. It pioneers the integration of large-model cognitive capabilities into real-time navigation control loops, synergistically combining a distributed mapping service with a global path generator to jointly optimize long-range traverse efficiency and safety. Simulation results demonstrate that, compared to conventional unimodal approaches, the framework achieves a 79.5% improvement in traversability efficiency, 100% hazardous-terrain avoidance rate, and significantly enhanced mission success probability and energy economy.
📝 Abstract
The increasingly complex and diverse planetary exploration environment requires more adaptable and flexible rover navigation strategy. In this study, we propose a VLM-empowered multi-mode system to achieve efficient while safe autonomous navigation for planetary rovers. Vision-Language Model (VLM) is used to parse scene information by image inputs to achieve a human-level understanding of terrain complexity. Based on the complexity classification, the system switches to the most suitable navigation mode, composing of perception, mapping and planning modules designed for different terrain types, to traverse the terrain ahead before reaching the next waypoint. By integrating the local navigation system with a map server and a global waypoint generation module, the rover is equipped to handle long-distance navigation tasks in complex scenarios. The navigation system is evaluated in various simulation environments. Compared to the single-mode conservative navigation method, our multi-mode system is able to bootstrap the time and energy efficiency in a long-distance traversal with varied type of obstacles, enhancing efficiency by 79.5%, while maintaining its avoidance capabilities against terrain hazards to guarantee rover safety. More system information is shown at https://chengsn1234.github.io/multi-mode-planetary-navigation/.