🤖 AI Summary
Current UAV systems predominantly rely on rule-based control and narrow-domain AI, lacking situational awareness, autonomous decision-making, and ecosystem-level coordination—particularly the integration of large language models (LLMs) with tool-calling capabilities. To address this, we propose the first five-layer LLM-based agent architecture specifically designed for UAVs, integrating YOLOv11 for real-time perception, GPT-4 for high-level reasoning, and a lightweight local Gemma-3 model for on-device decision-making. Built upon ROS2/Gazebo, our simulation platform enables, for the first time, LLM-driven real-time knowledge retrieval, context-aware reasoning, and cross-system tool invocation. In a search-and-rescue simulation, the system achieves a 91% human detection rate, increases actionable recommendation generation from 4.5% to 92%, and attains a mean detection confidence of 0.79—demonstrating the feasibility of high-level autonomy in dynamic, unstructured environments.
📝 Abstract
Unmanned Aerial Vehicles (UAVs) are increasingly deployed in defense, surveillance, and disaster response, yet most systems remain confined to SAE Level 2--3 autonomy. Their reliance on rule-based control and narrow AI restricts adaptability in dynamic, uncertain missions. Existing UAV frameworks lack context-aware reasoning, autonomous decision-making, and ecosystem-level integration; critically, none leverage Large Language Model (LLM) agents with tool-calling for real-time knowledge access. This paper introduces the Agentic UAVs framework, a five-layer architecture (Perception, Reasoning, Action, Integration, Learning) that augments UAVs with LLM-driven reasoning, database querying, and third-party system interaction. A ROS2 and Gazebo-based prototype integrates YOLOv11 object detection with GPT-4 reasoning and local Gemma-3 deployment. In simulated search-and-rescue scenarios, agentic UAVs achieved higher detection confidence (0.79 vs. 0.72), improved person detection rates (91% vs. 75%), and markedly increased action recommendation (92% vs. 4.5%). These results confirm that modest computational overhead enables qualitatively new levels of autonomy and ecosystem integration.