General-Purpose Aerial Intelligent Agents Empowered by Large Language Models

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of unmanned aerial vehicles (UAVs) in open-world settings—namely, their reliance on predefined tasks and inability to perform dynamic task planning and scene understanding under communication-constrained conditions—this work introduces the first general-purpose aerial agent for open-world operation. Our approach leverages hardware-software co-design to enable real-time onboard inference of a 14B-parameter large language model (LLM) at 5–6 tokens/sec on edge hardware, and proposes a bidirectional cognitive architecture that tightly integrates deliberative (slow-thinking) task planning with reactive (fast-response) motion control. Key technical contributions include an edge-optimized computing platform, an airborne LLM inference engine, multimodal perception fusion, visual-inertial SLAM, hierarchical motion planning, and a bidirectional cognitive control framework. We validate the agent’s robust task comprehension, autonomous planning, and execution capabilities across diverse low-connectivity scenarios: sugarcane monitoring, power grid inspection, mine/tunnel exploration, and biological observation.

Technology Category

Application Category

📝 Abstract
The emergence of large language models (LLMs) opens new frontiers for unmanned aerial vehicle (UAVs), yet existing systems remain confined to predefined tasks due to hardware-software co-design challenges. This paper presents the first aerial intelligent agent capable of open-world task execution through tight integration of LLM-based reasoning and robotic autonomy. Our hardware-software co-designed system addresses two fundamental limitations: (1) Onboard LLM operation via an edge-optimized computing platform, achieving 5-6 tokens/sec inference for 14B-parameter models at 220W peak power; (2) A bidirectional cognitive architecture that synergizes slow deliberative planning (LLM task planning) with fast reactive control (state estimation, mapping, obstacle avoidance, and motion planning). Validated through preliminary results using our prototype, the system demonstrates reliable task planning and scene understanding in communication-constrained environments, such as sugarcane monitoring, power grid inspection, mine tunnel exploration, and biological observation applications. This work establishes a novel framework for embodied aerial artificial intelligence, bridging the gap between task planning and robotic autonomy in open environments.
Problem

Research questions and friction points this paper is trying to address.

Develops aerial intelligent agents for open-world tasks using LLMs.
Addresses hardware-software co-design challenges for UAVs.
Enables onboard LLM operation and bidirectional cognitive architecture.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge-optimized computing for onboard LLM operation
Bidirectional cognitive architecture for task planning
Integration of LLM reasoning with robotic autonomy
🔎 Similar Papers
No similar papers found.
Ji Zhao
Ji Zhao
PhD, Huazhong University of Science and Technology
Computer VisionMachine LearningRobotics
X
Xiao Lin
Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China