A Red Teaming Framework for Evaluating Robustness of AI-enabled Security Orchestration, Automation, and Response Systems

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the limited robustness of existing AI-driven SOAR systems against adaptive adversaries and the absence of effective evaluation methodologies. To this end, the authors propose an autonomous red teaming framework that integrates the strategic planning capabilities of large language models (LLMs) with the tactical execution strengths of reinforcement learning (RL) through a hierarchical hybrid architecture. A kill-chain-aligned reward mechanism is designed to generate multi-stage, adaptive attacks within a high-fidelity enterprise network simulation environment. Experimental results demonstrate that pure LLM-based agents struggle to sustain prolonged attacks, and specialized security models achieve only limited breaches. In contrast, the proposed LLM-RL hybrid approach significantly enhances attack simulation fidelity and effectively evaluates the defensive resilience of SOAR systems.

📝 Abstract

AI-enabled Security Orchestration, Automation, and Response (SOAR) systems increasingly employ autonomous agents for cyber defense, yet their resilience to adaptive adversaries is underexplored. We introduce an autonomous red teaming framework that integrates large language models (LLMs) with reinforcement learning (RL) to generate adaptive, multi-stage attack campaigns against autonomous defenders in enterprise networks. A hierarchical design combines an LLM-based planner for strategic intent with an RL controller for tactical execution, supported by reward shaping aligned with kill-chain progression. Evaluation in a high-fidelity enterprise simulation demonstrates the effectiveness of the proposed approach, while also showing that standalone LLM agents fail to sustain multi-stage attack campaigns and that domain-specific cybersecurity models achieve only limited levels of compromise, highlighting the necessity for hybrid LLM-RL approaches to red teaming.

Problem

Research questions and friction points this paper is trying to address.

AI-enabled SOAR

robustness

adaptive adversaries

red teaming

autonomous cyber defense

Innovation

Methods, ideas, or system contributions that make the work stand out.

red teaming

large language models

reinforcement learning