🤖 AI Summary
Current LLM safety research faces systemic bottlenecks—including fragmented public datasets, difficulty in dataset selection, and misalignment between evaluation benchmarks and available resources.
Method: We conduct the first systematic survey of LLM safety datasets via iterative literature search, community-driven curation, and structured metadata annotation, covering 144 publicly available datasets. We establish SafetyPrompts.com, a dynamically updated open directory.
Contribution/Results: Our analysis reveals key trends (e.g., steadily increasing reliance on synthetic data) and critical gaps (e.g., severe underrepresentation of non-English languages and naturalistic interactive scenarios). We further find that mainstream safety benchmarks utilize fewer than 15% of the surveyed datasets. This work provides authoritative guidance for dataset selection, benchmark development, and research agenda formulation, alongside foundational infrastructure to advance the field.
📝 Abstract
The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety. However, much of this work has happened in parallel, and with very different goals in mind, ranging from the mitigation of near-term risks around bias and toxic content generation to the assessment of longer-term catastrophic risk potential. This makes it difficult for researchers and practitioners to find the most relevant datasets for their use case, and to identify gaps in dataset coverage that future work may fill. To remedy these issues, we conduct a first systematic review of open datasets for evaluating and improving LLM safety. We review 144 datasets, which we identified through an iterative and community-driven process over the course of several months. We highlight patterns and trends, such as a trend towards fully synthetic datasets, as well as gaps in dataset coverage, such as a clear lack of non-English and naturalistic datasets. We also examine how LLM safety datasets are used in practice -- in LLM release publications and popular LLM benchmarks -- finding that current evaluation practices are highly idiosyncratic and make use of only a small fraction of available datasets. Our contributions are based on SafetyPrompts.com, a living catalogue of open datasets for LLM safety, which we plan to update continuously as the field of LLM safety develops.