🤖 AI Summary
Current autonomous inspection systems are largely confined to digital tasks, with end-to-end autonomy in physical environments—particularly industrial visual inspection using unmanned aerial vehicles (UAVs)—remaining an open challenge. To address this, we propose a natural language–driven hierarchical multi-agent framework: a head agent orchestrates high-level task planning, while execution agents collaboratively perform UAV navigation and equipment reading recognition. We introduce ReActEval, a novel reasoning mechanism that closes the “plan–act–evaluate” loop, enabling dynamic decision-making and real-time feedback. Communication among agents is conducted exclusively via natural language, ensuring semantic interpretability and human-AI collaboration readiness. Integrating large language models, multi-agent systems, and UAV control techniques, we validate the framework in simulation for dual-UAV cooperative inspection. Results demonstrate high task success rates, strong robustness, and support for multi-tiered, complex industrial inspection scenarios.
📝 Abstract
Autonomous inspection systems are essential for ensuring the performance and longevity of industrial assets. Recently, agentic frameworks have demonstrated significant potential for automating inspection workflows but have been limited to digital tasks. Their application to physical assets in real-world environments, however, remains underexplored. In this work, our contributions are two-fold: first, we propose a hierarchical agentic framework for autonomous drone control, and second, a reasoning methodology for individual function executions which we refer to as ReActEval. Our framework focuses on visual inspection tasks in indoor industrial settings, such as interpreting industrial readouts or inspecting equipment. It employs a multi-agent system comprising a head agent and multiple worker agents, each controlling a single drone. The head agent performs high-level planning and evaluates outcomes, while worker agents implement ReActEval to reason over and execute low-level actions. Operating entirely in natural language, ReActEval follows a plan, reason, act, evaluate cycle, enabling drones to handle tasks ranging from simple navigation (e.g., flying forward 10 meters and land) to complex high-level tasks (e.g., locating and reading a pressure gauge). The evaluation phase serves as a feedback and/or replanning stage, ensuring actions align with user objectives while preventing undesirable outcomes. We evaluate the framework in a simulated environment with two worker agents, assessing performance qualitatively and quantitatively based on task completion across varying complexity levels and workflow efficiency. By leveraging natural language processing for agent communication, our approach offers a novel, flexible, and user-accessible alternative to traditional drone-based solutions, enabling autonomous problem-solving for industrial inspection without extensive user intervention.