CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This work addresses the security robustness degradation of large language models (LLMs) induced by instruction fine-tuning in cybersecurity contexts—revealing, for the first time, that such fine-tuning consistently impairs resistance to prompt injection and related adversarial attacks (e.g., Llama 3.1 8B’s security score drops from 0.95 to 0.15). To systematically evaluate this phenomenon, we introduce CyberLLMInstruct, the first cybersecurity-specific instruction-tuning safety benchmark, comprising 54,928 diverse samples spanning malware analysis, phishing simulation, and zero-day vulnerability assessment. Leveraging the OWASP Top 10 framework, we conduct comprehensive adversarial evaluation across seven mainstream open-source LLMs. Results demonstrate significant security deterioration across all fine-tuned models; yet our proposed CyberMetric achieves up to 92.50% functional accuracy, establishing a critical performance–security trade-off baseline. This work provides both an empirically grounded evaluation paradigm and actionable insights for developing secure, trustworthy domain-adapted LLMs.

Technology Category

Application Category

📝 Abstract

The integration of large language models (LLMs) into cyber security applications presents significant opportunities, such as enhancing threat analysis and malware detection, but can also introduce critical risks and safety concerns, including personal data leakage and automated generation of new malware. To address these challenges, we developed CyberLLMInstruct, a dataset of 54,928 instruction-response pairs spanning cyber security tasks such as malware analysis, phishing simulations, and zero-day vulnerabilities. The dataset was constructed through a multi-stage process. This involved sourcing data from multiple resources, filtering and structuring it into instruction-response pairs, and aligning it with real-world scenarios to enhance its applicability. Seven open-source LLMs were chosen to test the usefulness of CyberLLMInstruct: Phi 3 Mini 3.8B, Mistral 7B, Qwen 2.5 7B, Llama 3 8B, Llama 3.1 8B, Gemma 2 9B, and Llama 2 70B. In our primary example, we rigorously assess the safety of fine-tuned models using the OWASP top 10 framework, finding that fine-tuning reduces safety resilience across all tested LLMs and every adversarial attack (e.g., the security score of Llama 3.1 8B against prompt injection drops from 0.95 to 0.15). In our second example, we show that these same fine-tuned models can also achieve up to 92.50 percent accuracy on the CyberMetric benchmark. These findings highlight a trade-off between performance and safety, showing the importance of adversarial testing and further research into fine-tuning methodologies that can mitigate safety risks while still improving performance across diverse datasets and domains. All scripts required to reproduce the dataset, along with examples and relevant resources for replicating our results, will be made available upon the paper's acceptance.

Problem

Research questions and friction points this paper is trying to address.

Analyzing safety risks of fine-tuned LLMs in cybersecurity applications.

Developing CyberLLMInstruct dataset for cybersecurity task evaluation.

Exploring trade-offs between model performance and safety resilience.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed CyberLLMInstruct dataset for cybersecurity tasks

Tested seven open-source LLMs for safety and performance

Assessed safety using OWASP top 10 framework

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review

2024-05-08arXiv.orgCitations: 27