CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models

📅 2025-01-02

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing LLM safety evaluation methodologies lack fine-grained, domain-specific benchmarks for cybersecurity, hindering accurate assessment of jailbreaking risks. This paper introduces CySecBench—the first generative-AI-oriented prompt benchmark tailored to cybersecurity—comprising 12,662 closed-ended, fine-grained, and semantically classified malicious prompts, establishing a novel vertical-domain jailbreaking evaluation paradigm. Methodologically, it innovatively integrates generative-AI-driven data synthesis, multi-stage human-in-the-loop filtering, prompt obfuscation augmentation, and structured attack-type modeling to ensure evaluation standardization and methodological transferability. Empirical evaluation demonstrates jailbreaking success rates of 65%, 88%, and 17% on ChatGPT, Gemini, and Claude, respectively, and 78.5% on AdvBench—substantially outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Numerous studies have investigated methods for jailbreaking Large Language Models (LLMs) to generate harmful content. Typically, these methods are evaluated using datasets of malicious prompts designed to bypass security policies established by LLM providers. However, the generally broad scope and open-ended nature of existing datasets can complicate the assessment of jailbreaking effectiveness, particularly in specific domains, notably cybersecurity. To address this issue, we present and publicly release CySecBench, a comprehensive dataset containing 12662 prompts specifically designed to evaluate jailbreaking techniques in the cybersecurity domain. The dataset is organized into 10 distinct attack-type categories, featuring close-ended prompts to enable a more consistent and accurate assessment of jailbreaking attempts. Furthermore, we detail our methodology for dataset generation and filtration, which can be adapted to create similar datasets in other domains. To demonstrate the utility of CySecBench, we propose and evaluate a jailbreaking approach based on prompt obfuscation. Our experimental results show that this method successfully elicits harmful content from commercial black-box LLMs, achieving Success Rates (SRs) of 65% with ChatGPT and 88% with Gemini; in contrast, Claude demonstrated greater resilience with a jailbreaking SR of 17%. Compared to existing benchmark approaches, our method shows superior performance, highlighting the value of domain-specific evaluation datasets for assessing LLM security measures. Moreover, when evaluated using prompts from a widely used dataset (i.e., AdvBench), it achieved an SR of 78.5%, higher than the state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Cybersecurity

Evaluation Methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

CySecBench

prompt-based dataset

cybersecurity evaluation

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review