Toward Cybersecurity-Expert Small Language Models

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The cybersecurity domain suffers from a scarcity of high-quality domain-specific language models and training data, hindering large language models’ (LLMs) professional reasoning capabilities on security tasks. To address this, we propose CyberPal 2.0—a family of compact, domain-specialized LMs (4B–20B parameters)—and introduce SecKnowledge 2.0, a novel data augmentation pipeline integrating expert-in-the-loop curation, chain-of-thought instruction tuning, and LLM-driven multi-step grounding to generate high-fidelity, task-anchored security reasoning trajectories. Extensive experiments demonstrate that CyberPal 2.0 outperforms leading open- and closed-source LMs across multiple authoritative cybersecurity benchmarks: its 20B variant ranks first and the 4B variant second on threat investigation tasks—both significantly surpassing state-of-the-art models including GPT-4o. These results validate the efficacy of the “compact model + high-quality domain data” paradigm for advancing domain-specific AI in cybersecurity.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality, domain-specific models and training datasets. To address this gap, we present CyberPal 2.0, a family of cybersecurity-expert small language models (SLMs) ranging from 4B-20B parameters. To train CyberPal 2.0, we generate an enriched chain-of-thought cybersecurity instruction dataset built with our data enrichment and formatting pipeline, SecKnowledge 2.0, which integrates expert-in-the-loop steering of reasoning formats alongside LLM-driven multi-step grounding, yielding higher-fidelity, task-grounded reasoning traces for security tasks. Across diverse cybersecurity benchmarks, CyberPal 2.0 consistently outperforms its baselines and matches or surpasses various open and closed-source frontier models, while remaining a fraction of their size. On core cyber threat intelligence knowledge tasks, our models outperform almost all tested frontier models, ranking second only to Sec-Gemini v1. On core threat-investigation tasks, such as correlating vulnerabilities and bug tickets with weaknesses, our best 20B-parameter model outperforms GPT-4o, o1, o3-mini, and Sec-Gemini v1, ranking first, while our smallest 4B-parameter model ranks second.
Problem

Research questions and friction points this paper is trying to address.

Developing cybersecurity-expert small language models for domain applications
Creating enriched cybersecurity instruction datasets with expert-guided reasoning
Enhancing threat intelligence and investigation tasks with efficient SLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed cybersecurity-expert small language models
Created enriched chain-of-thought cybersecurity instruction dataset
Integrated expert-in-the-loop steering with LLM-driven grounding
🔎 Similar Papers
No similar papers found.
M
Matan Levi
IBM Research
Daniel Ohayon
Daniel Ohayon
Technion University
A
Ariel Blobstein
IBM Research
R
Ravid Sagi
IBM Research
Ian Molloy
Ian Molloy
IBM Research
Y
Yair Allouche
IBM Research