π€ AI Summary
Autonomous UAV path planning in complex environments suffers from insufficient perception-decision coupling and unclear applicability of foundation models. Method: This paper proposes the first LLM-VLM collaborative planning framework for real-time navigation, integrating large language models (LLMs) for high-level semantic reasoning with vision-language models (VLMs) for low-level environmental perception to enable semantically guided, safe trajectory generation. A lightweight co-architecture is designed for embedded real-time deployment. Contribution/Results: We systematically benchmark eight mainstream LLMs/VLMs on path planning tasks, establishing their performance boundaries. In multi-scenario simulations, our framework improves path rationality by +32.7% and enhances adaptability to dynamic environments. Real-world flight experiments validate its feasibility and robustness. The work establishes a reproducible technical paradigm for foundation modelβdriven embodied intelligent navigation.
π Abstract
Path planning is a critical component in autonomous drone operations, enabling safe and efficient navigation through complex environments. Recent advances in foundation models, particularly large language models (LLMs) and vision-language models (VLMs), have opened new opportunities for enhanced perception and intelligent decision-making in robotics. However, their practical applicability and effectiveness in global path planning remain relatively unexplored. This paper proposes foundation model-guided path planners (FM-Planner) and presents a comprehensive benchmarking study and practical validation for drone path planning. Specifically, we first systematically evaluate eight representative LLM and VLM approaches using standardized simulation scenarios. To enable effective real-time navigation, we then design an integrated LLM-Vision planner that combines semantic reasoning with visual perception. Furthermore, we deploy and validate the proposed path planner through real-world experiments under multiple configurations. Our findings provide valuable insights into the strengths, limitations, and feasibility of deploying foundation models in real-world drone applications and providing practical implementations in autonomous flight. Project site: https://github.com/NTU-ICG/FM-Planner.