Say the Mission, Execute the Swarm: Agent-Enhanced LLM Reasoning in the Web-of-Drones

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenges of deploying large language models (LLMs) for controlling heterogeneous drone swarms—namely, incompatible interfaces, lack of embodiment, and difficulties in closed-loop execution—by introducing a task-agnostic agent augmentation framework. The proposed approach integrates LLM-based reasoning with the Model Context Protocol for communication and leverages W3C Web of Things (WoT) device abstractions to construct a “Web-of-Drones” architecture, enabling structured, code-free tool invocation and secure execution. Evaluated in an ArduPilot simulation environment across four swarm coordination tasks using six prominent LLMs, the system demonstrates that embodied interaction substantially enhances task reliability. Notably, execution quality cannot be reliably inferred from token consumption alone.

📝 Abstract

Large Language Models (LLMs) are increasingly explored as high-level reasoning engines for cyber-physical systems, yet their application to real-time UAV swarm management remains challenging due to heterogeneous interfaces, limited grounding, and the need for long-running closed-loop execution. This paper presents a mission-agnostic, agent-enhanced LLM framework for UAV swarm control, where users express mission objectives in natural language and the system autonomously executes them through grounded, real-time interactions. The proposed architecture combines an LLM-based Agent Core with a Model Context Protocol (MCP) gateway and a Web-of-Drones abstraction based on W3C Web of Things (WoT) standards. By exposing drones, sensors, and services as standardized WoT Things, the framework enables structured tool-based interaction, continuous state observation, and safe actuation without relying on code generation. We evaluate the framework using ArduPilot-based simulation across four swarm missions and six state-of-the-art LLMs. Results show that, despite strong reasoning abilities, current general-purpose LLMs still struggle to achieve reliable execution - even for simple swarm tasks - when operating without explicit grounding and execution support. Task-specific planning tools and runtime guardrails substantially improve robustness, while token consumption alone is not indicative of execution quality or reliability.

Problem

Research questions and friction points this paper is trying to address.

UAV swarm

Large Language Models

real-time control

grounding

cyber-physical systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-enhanced LLM

Web-of-Drones

Model Context Protocol