A Red Teaming Framework for Evaluating Robustness of AI-enabled Security Orchestration, Automation, and Response Systems

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
This work addresses the limited robustness of existing AI-driven SOAR systems against adaptive adversaries and the absence of effective evaluation methodologies. To this end, the authors propose an autonomous red teaming framework that integrates the strategic planning capabilities of large language models (LLMs) with the tactical execution strengths of reinforcement learning (RL) through a hierarchical hybrid architecture. A kill-chain-aligned reward mechanism is designed to generate multi-stage, adaptive attacks within a high-fidelity enterprise network simulation environment. Experimental results demonstrate that pure LLM-based agents struggle to sustain prolonged attacks, and specialized security models achieve only limited breaches. In contrast, the proposed LLM-RL hybrid approach significantly enhances attack simulation fidelity and effectively evaluates the defensive resilience of SOAR systems.
📝 Abstract
AI-enabled Security Orchestration, Automation, and Response (SOAR) systems increasingly employ autonomous agents for cyber defense, yet their resilience to adaptive adversaries is underexplored. We introduce an autonomous red teaming framework that integrates large language models (LLMs) with reinforcement learning (RL) to generate adaptive, multi-stage attack campaigns against autonomous defenders in enterprise networks. A hierarchical design combines an LLM-based planner for strategic intent with an RL controller for tactical execution, supported by reward shaping aligned with kill-chain progression. Evaluation in a high-fidelity enterprise simulation demonstrates the effectiveness of the proposed approach, while also showing that standalone LLM agents fail to sustain multi-stage attack campaigns and that domain-specific cybersecurity models achieve only limited levels of compromise, highlighting the necessity for hybrid LLM-RL approaches to red teaming.
Problem

Research questions and friction points this paper is trying to address.

AI-enabled SOAR
robustness
adaptive adversaries
red teaming
autonomous cyber defense
Innovation

Methods, ideas, or system contributions that make the work stand out.

red teaming
large language models
reinforcement learning
SOAR systems
adaptive attacks
🔎 Similar Papers
No similar papers found.