🤖 AI Summary
To address the limited autonomous decision-making and execution capabilities of unmanned aerial vehicles (UAVs) in complex environments, this paper proposes the novel paradigm of “embodied low-altitude agents,” establishing an embodied intelligence framework integrating UAVs with large language models (LLMs). Methodologically, we design a multimodal data taxonomy and task-scenario mapping framework; integrate vision/IMU/GNSS perception, LLM-based instruction understanding and hierarchical planning, tool invocation (via APIs and flight-control interfaces), memory-augmented reasoning, and simulation-to-real co-training; and construct a domain-specific multimodal data resource atlas covering 12 representative low-altitude tasks. Key contributions include: (1) the first systematic formalization of the embodied low-altitude agent concept; (2) release of an open-source technology roadmap; and (3) proposal of a scalable Agentic UAV reference architecture, validated through prototypes in logistics and inspection scenarios—demonstrating significant improvements in task comprehension, dynamic adaptability, and autonomous execution.
📝 Abstract
Low-altitude mobility, exemplified by unmanned aerial vehicles (UAVs), has introduced transformative advancements across various domains, like transportation, logistics, and agriculture. Leveraging flexible perspectives and rapid maneuverability, UAVs extend traditional systems' perception and action capabilities, garnering widespread attention from academia and industry. However, current UAV operations primarily depend on human control, with only limited autonomy in simple scenarios, and lack the intelligence and adaptability needed for more complex environments and tasks. The emergence of large language models (LLMs) demonstrates remarkable problem-solving and generalization capabilities, offering a promising pathway for advancing UAV intelligence. This paper explores the integration of LLMs and UAVs, beginning with an overview of UAV systems' fundamental components and functionalities, followed by an overview of the state-of-the-art in LLM technology. Subsequently, it systematically highlights the multimodal data resources available for UAVs, which provide critical support for training and evaluation. Furthermore, it categorizes and analyzes key tasks and application scenarios where UAVs and LLMs converge. Finally, a reference roadmap towards agentic UAVs is proposed, aiming to enable UAVs to achieve agentic intelligence through autonomous perception, memory, reasoning, and tool utilization. Related resources are available at https://github.com/Hub-Tian/UAVs_Meet_LLMs.