RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

πŸ“… 2026-01-07
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing red-teaming datasets suffer from inconsistent risk taxonomies, limited domain coverage, and outdated evaluation criteria, hindering systematic safety assessment of large language models (LLMs). To address these limitations, this work proposes RedBenchβ€”a unified red-teaming benchmark that integrates 37 authoritative data sources, comprising 29,362 attack and refusal samples. RedBench introduces, for the first time, a standardized taxonomy encompassing 22 risk categories and 19 domains. Through comprehensive data aggregation, consistent classification, and broad domain representation, RedBench significantly enhances the systematicity and comparability of LLM safety evaluations. The project publicly releases both the dataset and evaluation framework, establishing a robust baseline for developing safer and more reliable LLMs.

Technology Category

Application Category

πŸ“ Abstract
As large language models (LLMs) become integral to safety-critical applications, ensuring their robustness against adversarial prompts is paramount. However, existing red teaming datasets suffer from inconsistent risk categorizations, limited domain coverage, and outdated evaluations, hindering systematic vulnerability assessments. To address these challenges, we introduce RedBench, a universal dataset aggregating 37 benchmark datasets from leading conferences and repositories, comprising 29,362 samples across attack and refusal prompts. RedBench employs a standardized taxonomy with 22 risk categories and 19 domains, enabling consistent and comprehensive evaluations of LLM vulnerabilities. We provide a detailed analysis of existing datasets, establish baselines for modern LLMs, and open-source the dataset and evaluation code. Our contributions facilitate robust comparisons, foster future research, and promote the development of secure and reliable LLMs for real-world deployment. Code: https://github.com/knoveleng/redeval
Problem

Research questions and friction points this paper is trying to address.

red teaming
large language models
adversarial prompts
risk categorization
vulnerability assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

RedBench
red teaming
large language models
risk taxonomy
adversarial evaluation
πŸ”Ž Similar Papers
No similar papers found.