XekRung Technical Report

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the current lack of large language models that simultaneously exhibit strong cybersecurity expertise and robust general-language understanding. To bridge this gap, we propose XekRung, a domain-specialized large language model for cybersecurity, trained through a systematic pipeline comprising continual pretraining (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). Central to our approach is a domain-customized data synthesis framework that generates high-quality training corpora, alongside a novel multidimensional evaluation suite designed to jointly optimize both security-specific proficiency and general-purpose capabilities. Experimental results demonstrate that XekRung achieves state-of-the-art performance among models of comparable scale on cybersecurity-specific benchmarks while maintaining competitive, if not superior, results on general-language tasks.

📝 Abstract

We present XekRung, a frontier large language model for cybersecurity, designed to provide comprehensive security capabilities. To achieve this, we develop diverse data synthesis pipelines tailored to the cybersecurity domain, enabling the scalable construction of high-quality training data and providing a strong foundation for cybersecurity knowledge and understanding. Building on this foundation, we establish a complete training pipeline spanning continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL) to further extend the model's capabilities. We further introduce a multi-dimensional evaluation system to guide the iterative improvement of both domain-specific and general-purpose abilities. Extensive experiments demonstrate that XekRung achieves state-of-the-art performance on cybersecurity-specific benchmarks among models of the same scale, while maintaining strong performance on general benchmarks.

Problem

Research questions and friction points this paper is trying to address.

large language model

cybersecurity

domain-specific knowledge

model evaluation

training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

data synthesis pipeline

cybersecurity LLM

multi-stage training