Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning

📅 2025-09-14

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Current UAV systems predominantly rely on rule-based control and narrow-domain AI, lacking situational awareness, autonomous decision-making, and ecosystem-level coordination—particularly the integration of large language models (LLMs) with tool-calling capabilities. To address this, we propose the first five-layer LLM-based agent architecture specifically designed for UAVs, integrating YOLOv11 for real-time perception, GPT-4 for high-level reasoning, and a lightweight local Gemma-3 model for on-device decision-making. Built upon ROS2/Gazebo, our simulation platform enables, for the first time, LLM-driven real-time knowledge retrieval, context-aware reasoning, and cross-system tool invocation. In a search-and-rescue simulation, the system achieves a 91% human detection rate, increases actionable recommendation generation from 4.5% to 92%, and attains a mean detection confidence of 0.79—demonstrating the feasibility of high-level autonomy in dynamic, unstructured environments.

Technology Category

Application Category

📝 Abstract

Unmanned Aerial Vehicles (UAVs) are increasingly deployed in defense, surveillance, and disaster response, yet most systems remain confined to SAE Level 2--3 autonomy. Their reliance on rule-based control and narrow AI restricts adaptability in dynamic, uncertain missions. Existing UAV frameworks lack context-aware reasoning, autonomous decision-making, and ecosystem-level integration; critically, none leverage Large Language Model (LLM) agents with tool-calling for real-time knowledge access. This paper introduces the Agentic UAVs framework, a five-layer architecture (Perception, Reasoning, Action, Integration, Learning) that augments UAVs with LLM-driven reasoning, database querying, and third-party system interaction. A ROS2 and Gazebo-based prototype integrates YOLOv11 object detection with GPT-4 reasoning and local Gemma-3 deployment. In simulated search-and-rescue scenarios, agentic UAVs achieved higher detection confidence (0.79 vs. 0.72), improved person detection rates (91% vs. 75%), and markedly increased action recommendation (92% vs. 4.5%). These results confirm that modest computational overhead enables qualitatively new levels of autonomy and ecosystem integration.

Problem

Research questions and friction points this paper is trying to address.

Enhancing UAV autonomy beyond rule-based systems

Integrating LLM reasoning for dynamic mission adaptability

Enabling real-time knowledge access and ecosystem integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven reasoning for autonomous UAV decision-making

Integrated tool-calling with real-time knowledge access

Five-layer architecture combining perception and cognitive systems

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation