APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This study addresses the challenge of applying large language model (LLM)-based automated penetration testing to bare-metal industrial operational technology (OT) devices—such as microcontrollers running Modbus/TCP or CoAP—which lack a conventional operating system environment. To bridge this gap, the authors propose the first end-to-end autonomous purple teaming framework tailored for such resource-constrained systems. By designing a protocol-level action space and incorporating a runtime overseer mechanism, the framework enables fully automated vulnerability discovery, exploitation, remediation, and validation without relying on a file system. Evaluated on Zephyr RTOS firmware, heterogeneous IIoT topologies, and five mainstream LLMs, the approach achieved a 90.0% end-to-end success rate across 290 trials, demonstrating for the first time the feasibility of LLM-driven autonomous offensive and defensive operations in bare-metal OT environments.

📝 Abstract

Bare-metal operational technology (OT) devices -- especially the microcontrollers running Modbus/TCP and CoAP at the base of industrial control systems -- have remained outside the reach of autonomous security attacks. Prior autonomous pentesting studies target Linux and web systems, whose shells and filesystems are familiar to LLM agents. Bare-metal OT has neither, so agents must reason directly over protocol fields and parser semantics. This requires new action-space designs and runtime controls, and opens new research questions about protocol-level exploit reasoning and its deployment envelope. We present APIOT (Autonomous Purple-teaming for Industrial OT), the first large language model (LLM) framework demonstrating an autonomous attack and remediation of bare-metal OT devices, achieving the full discovery -> exploitation -> patching -> verification cycle without step-by-step human intervention. We implemented and evaluated this framework on Zephyr RTOS firmware across heterogeneous industrial IoT (IIoT) topologies. Through 290 experiment runs spanning five frontier LLMs, three network topologies, two impairment levels, and guided versus unguided conditions, APIOT achieved a mission success rate of 90.0% on the full attack-remediation cycle. We found that the runtime governance layer (which we call an overseer) is a critical engineering variable: without it, agents exhibit systematic degenerate patterns, including repetition loops, missing crash verification, and reconnaissance deadlocks. Together, these findings carry two implications beyond our testbed. Attacker expertise is no longer the binding constraint on bare-metal OT exploitation, and defender threat models must now assume LLM-augmented adversaries capable of executing autonomous discovery-through-remediation cycles against industrial firmware.

Problem

Research questions and friction points this paper is trying to address.

bare-metal OT

autonomous vulnerability management

protocol-level exploit

industrial control systems

LLM-augmented adversaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

autonomous vulnerability management

bare-metal OT

LLM agent