🤖 AI Summary
Security Operations Centers (SOCs) suffer from analyst fatigue and missed detections due to overwhelming alert volumes; existing large language model (LLM)-based approaches rely on monolithic end-to-end processing, failing to address enterprise log noise, contextual sparsity, and unverifiable decision-making. This paper proposes a multi-agent LLM architecture that decouples analysis into three specialized agents—behavioral analysis, evidence collection, and reasoning adjudication—collaboratively constructing auditable, chain-of-evidence reasoning for fine-grained high-risk alert classification. Key innovations include log sequence modeling, cross-system evidence retrieval, and structured logical inference. We also release the first fine-grained, real-world dataset tailored for SOC investigation tasks. Experiments demonstrate substantial false positive reduction, outperforming single-agent LLM baselines across diverse enterprise scenarios, while ensuring high accuracy, strong interpretability, and practical deployability.
📝 Abstract
Security Operations Centers (SOCs) are overwhelmed by tens of thousands of daily alerts, with only a small fraction corresponding to genuine attacks. This overload creates alert fatigue, leading to overlooked threats and analyst burnout. Classical detection pipelines are brittle and context-poor, while recent LLM-based approaches typically rely on a single model to interpret logs, retrieve context, and adjudicate alerts end-to-end -- an approach that struggles with noisy enterprise data and offers limited transparency. We propose CORTEX, a multi-agent LLM architecture for high-stakes alert triage in which specialized agents collaborate over real evidence: a behavior-analysis agent inspects activity sequences, evidence-gathering agents query external systems, and a reasoning agent synthesizes findings into an auditable decision. To support training and evaluation, we release a dataset of fine-grained SOC investigations from production environments, capturing step-by-step analyst actions and linked tool outputs. Across diverse enterprise scenarios, CORTEX substantially reduces false positives and improves investigation quality over state-of-the-art single-agent LLMs.