🤖 AI Summary
This work addresses the limited robustness, adaptability, and interpretability of existing reinforcement learning–based cloud network defense methods under dynamic environments and evolving attack strategies, particularly due to the absence of human-in-the-loop mechanisms. To overcome these limitations, the authors propose CyberOps-Bots, a novel autonomous defense framework that integrates large language models (LLMs) with hierarchical multi-agent reinforcement learning. In this architecture, a high-level LLM agent performs global tactical planning through ReAct-style reasoning grounded in the IPDRR cyber defense model, while low-level RL agents execute localized defensive actions. The design incorporates a MITRE ATT&CK–inspired two-tier structure and a heterogeneous disentangled pretraining mechanism. Crucially, the framework supports human-in-the-loop intervention and adapts to new scenarios without retraining. Experiments on real-world cloud datasets demonstrate a 68.5% improvement in network availability and a 34.7% performance gain during cross-scenario transfer.
📝 Abstract
While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience. Reinforcement Learning (RL)-based defense strategies have been developed to optimize resource deployment and isolation policies under adversarial conditions, aiming to enhance system resilience by maintaining and restoring network availability. However, existing approaches lack robustness as they require retraining to adapt to dynamic changes in network structure, node scale, attack strategies, and attack intensity. Furthermore, the lack of Human-in-the-Loop (HITL) support limits interpretability and flexibility. To address these limitations, we propose CyberOps-Bots, a hierarchical multi-agent reinforcement learning framework empowered by Large Language Models (LLMs). Inspired by MITRE ATT&CK's Tactics-Techniques model, CyberOps-Bots features a two-layer architecture: (1) An upper-level LLM agent with four modules--ReAct planning, IPDRR-based perception, long-short term memory, and action/tool integration--performs global awareness, human intent recognition, and tactical planning; (2) Lower-level RL agents, developed via heterogeneous separated pre-training, execute atomic defense actions within localized network regions. This synergy preserves LLM adaptability and interpretability while ensuring reliable RL execution. Experiments on real cloud datasets show that, compared to state-of-the-art algorithms, CyberOps-Bots maintains network availability 68.5% higher and achieves a 34.7% jumpstart performance gain when shifting the scenarios without retraining. To our knowledge, this is the first study to establish a robust LLM-RL framework with HITL support for cloud defense. We will release our framework to the community, facilitating the advancement of robust and autonomous defense in cloud networks.