CyberRAG: An agentic RAG cyber attack classification and reporting tool

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Large-scale enterprise IDS/IPS systems generate massive daily alert volumes, imposing dual challenges on analysts: high false-positive rates and insufficient contextual explanations. To address these issues, this paper proposes an Agent-based Retrieval-Augmented Generation (Agent-RAG) framework that integrates a multi-expert classifier, tool adapters, and an iterative retrieval-reasoning loop—enabling plug-and-play extensibility for attack types. The framework leverages fine-tuned domain-specific classification models, semantic retrieval from a curated cybersecurity knowledge base, LLM-driven explanatory generation, and self-consistency verification to deliver real-time, interpretable, and structured threat assessments. Experimental results demonstrate a classification accuracy of 94.92%, an explanation quality BERTScore of 0.94, and an expert-assessed interpretability score of 4.9/5 (out of 5). The approach significantly reduces false positives while enhancing both readability and trustworthiness of analyst-facing outputs.

Technology Category

Application Category

📝 Abstract

Intrusion Detection and Prevention Systems (IDS/IPS) in large enterprises can generate hundreds of thousands of alerts per hour, overwhelming security analysts with logs that demand deep, rapidly evolving domain expertise. Conventional machine-learning detectors trim the alert volume but still yield high false-positive rates, while standard single-pass Retrieval-Augmented Generation (RAG) pipelines often retrieve irrelevant context and fail to justify their predictions. To overcome these shortcomings, we present CyberRAG, a modular, agent-based RAG framework that delivers real-time classification, explanation, and structured reporting for cyber-attacks. A central LLM agent orchestrates (i) a pool of fine-tuned specialized classifiers, each tailored to a distinct attack family; (ii) tool adapters for enrichment and alerting; and (iii) an iterative retrieval-and-reason loop that continuously queries a domain-specific knowledge base until the evidence is both relevant and self-consistent. Unlike traditional RAG systems, CyberRAG embraces an agentic design that enables dynamic control flow and adaptive reasoning. This agent-centric architecture refines its threat labels and natural-language justifications autonomously, reducing false positives and enhancing interpretability. The framework is fully extensible: new attack types can be supported by simply adding a classifier without retraining the core agent. CyberRAG has been evaluated achieving over 94% accuracy per class and pushing final classification accuracy to 94.92% through semantic orchestration. Generated explanations score up to 0.94 in BERTScore and 4.9/5 in GPT-4-based expert evaluation. These results show that agentic, specialist-oriented RAG can pair high detection accuracy with trustworthy, SOC-ready prose, offering a practical and scalable path toward semi-autonomous cyber-defence workflows.

Problem

Research questions and friction points this paper is trying to address.

High false-positive rates in cyber attack alerts

Irrelevant context retrieval in RAG pipelines

Overwhelming volume of IDS/IPS alerts for analysts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-based RAG framework for cyber-attack classification

Modular design with fine-tuned specialized classifiers

Iterative retrieval-and-reason loop for relevant evidence

🔎 Similar Papers

No similar papers found.