Adapting Large Language Models to Emerging Cybersecurity using Retrieval Augmented Generation

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited adaptability to emerging cyber threats, suffer from opaque reasoning processes, and lack verifiable trustworthiness in dynamic security environments. To address these challenges, this paper proposes a retrieval-augmented generation (RAG) framework tailored for dynamic cybersecurity scenarios. The core contribution is an optimized hybrid retrieval mechanism that integrates real-time, multi-source threat intelligence—including CVE, MITRE ATT&CK, and operational threat feeds—to enhance LLMs’ temporal reasoning and contextual understanding of novel attack patterns. We employ Llama-3-8B-Instruct as the foundation model, enabling knowledge dynamism and generating interpretable, traceable responses. Experimental evaluation on threat detection tasks demonstrates significant improvements over baselines: +12.7% accuracy gain and enhanced output consistency. The framework substantially improves model adaptability to rapidly evolving threat landscapes while strengthening reliability and explainability in security-critical applications.

Technology Category

Application Category

📝 Abstract

Security applications are increasingly relying on large language models (LLMs) for cyber threat detection; however, their opaque reasoning often limits trust, particularly in decisions that require domain-specific cybersecurity knowledge. Because security threats evolve rapidly, LLMs must not only recall historical incidents but also adapt to emerging vulnerabilities and attack patterns. Retrieval-Augmented Generation (RAG) has demonstrated effectiveness in general LLM applications, but its potential for cybersecurity remains underexplored. In this work, we introduce a RAG-based framework designed to contextualize cybersecurity data and enhance LLM accuracy in knowledge retention and temporal reasoning. Using external datasets and the Llama-3-8B-Instruct model, we evaluate baseline RAG, an optimized hybrid retrieval approach, and conduct a comparative analysis across multiple performance metrics. Our findings highlight the promise of hybrid retrieval in strengthening the adaptability and reliability of LLMs for cybersecurity tasks.

Problem

Research questions and friction points this paper is trying to address.

Adapting LLMs to evolving cybersecurity threats

Enhancing LLM accuracy in threat detection

Addressing opaque reasoning in security applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG framework contextualizes cybersecurity data for LLMs

Hybrid retrieval enhances LLM adaptability to emerging threats

Optimized approach improves accuracy in temporal reasoning

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review