CyberLLM-FINDS 2025: Instruction-Tuned Fine-tuning of Domain-Specific LLMs with Retrieval-Augmented Generation and Graph Integration for MITRE Evaluation

📅 2026-01-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the limitations of general-purpose large language models (LLMs) in cybersecurity tasks due to insufficient domain-specific knowledge. The authors propose a hybrid architecture that integrates graph-structured reasoning with retrieval-augmented generation (RAG), leveraging instruction-tuned Gemma-2B enhanced through synthetic data and STIX-based threat intelligence aligned with the MITRE ATT&CK framework. To optimize reasoning under short-context constraints, the approach incorporates chain-of-thought prompting and quantization strategies. Experimental results demonstrate significant improvements in recall and alignment accuracy for Tactics, Techniques, and Procedures (TTPs) across both multi-hop reasoning and long-context scenarios. The study validates the efficacy of graph-augmented LLMs for cybersecurity threat intelligence analysis, highlighting their potential to enhance structured, knowledge-intensive reasoning in this domain.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) such as Gemma-2B have shown strong performance in various natural language processing tasks. However, general-purpose models often lack the domain expertise required for cybersecurity applications. This work presents a methodology to fine-tune the Gemma-2B model into a domain-specific cybersecurity LLM. We detail the processes of dataset preparation, fine-tuning, and synthetic data generation, along with implications for real-world applications in threat detection, forensic investigation, and attack analysis. Experiments highlight challenges in prompt length distribution during domain-specific fine-tuning. Uneven prompt lengths limit the model's effective use of the context window, constraining local inference to 200-400 tokens despite hardware support for longer sequences. Chain-of-thought styled prompts, paired with quantized weights, yielded the best performance under these constraints. To address context limitations, we employed a hybrid strategy using cloud LLMs for synthetic data generation and local fine-tuning for deployment efficiency. To extend the evaluation, we introduce a Retrieval-Augmented Generation (RAG) pipeline and graph-based reasoning framework. This approach enables structured alignment with MITRE ATT&CK techniques through STIX-based threat intelligence, enhancing recall in multi-hop and long-context scenarios. Graph modules encode entity-neighborhood context and tactic chains, helping mitigate the constraints of short prompt windows. Results demonstrate improved model alignment with tactic, technique, and procedure (TTP) coverage, validating the utility of graph-augmented LLMs in cybersecurity threat intelligence applications.

Problem

Research questions and friction points this paper is trying to address.

Cybersecurity

Domain-Specific LLMs

Context Window Limitation

MITRE ATT&CK

Threat Intelligence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation

Graph-based Reasoning

Domain-Specific LLM