RedSage: A Cybersecurity Generalist LLM

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses critical limitations in existing large language models for cybersecurity—namely, reliance on privacy-risk-prone proprietary APIs and insufficient domain adaptation. To overcome these challenges, we propose RedSage, an 8-billion-parameter domain-specialized model deployable locally. RedSage is trained on a curated corpus of 11.8 billion cybersecurity-related tokens and 266K multi-turn dialogues simulating expert workflows, augmented with general open-source data through domain-adaptive pretraining and supervised fine-tuning. We introduce a novel domain-aware agent-augmented training paradigm, enabling the first end-to-end modeling of full-spectrum cybersecurity tasks. Additionally, we construct RedSage-Bench, a comprehensive evaluation benchmark spanning knowledge, skills, and tool usage. Experiments demonstrate that RedSage achieves gains of up to 5.59 points on cybersecurity-specific benchmarks and 5.05 points on general LLM leaderboards, significantly outperforming baseline models.

Technology Category

Application Category

📝 Abstract

Cybersecurity operations demand assistant LLMs that support diverse workflows without exposing sensitive data. Existing solutions either rely on proprietary APIs with privacy risks or on open models lacking domain adaptation. To bridge this gap, we curate 11.8B tokens of cybersecurity-focused continual pretraining data via large-scale web filtering and manual collection of high-quality resources, spanning 28.6K documents across frameworks, offensive techniques, and security tools. Building on this, we design an agentic augmentation pipeline that simulates expert workflows to generate 266K multi-turn cybersecurity samples for supervised fine-tuning. Combined with general open-source LLM data, these resources enable the training of RedSage, an open-source, locally deployable cybersecurity assistant with domain-aware pretraining and post-training. To rigorously evaluate the models, we introduce RedSage-Bench, a benchmark with 30K multiple-choice and 240 open-ended Q&A items covering cybersecurity knowledge, skills, and tool expertise. RedSage is further evaluated on established cybersecurity benchmarks (e.g., CTI-Bench, CyberMetric, SECURE) and general LLM benchmarks to assess broader generalization. At the 8B scale, RedSage achieves consistently better results, surpassing the baseline models by up to +5.59 points on cybersecurity benchmarks and +5.05 points on Open LLM Leaderboard tasks. These findings demonstrate that domain-aware agentic augmentation and pre/post-training can not only enhance cybersecurity-specific expertise but also help to improve general reasoning and instruction-following. All models, datasets, and code are publicly available.

Problem

Research questions and friction points this paper is trying to address.

cybersecurity

large language models

privacy

domain adaptation

assistant LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cybersecurity LLM

Domain-aware Pretraining

Agentic Augmentation