๐ค AI Summary
Current safety evaluations of large language models in illicit activity scenarios lack high-quality, structured question-answering data. To address this gap, this work proposes a fine-grained methodology for constructing illicit activityโoriented QA pairs, grounded in manual content analysis of the AnswerCarefully dataset, and introduces a structured evaluation rubric for assessing safety-aligned model responses. This approach substantially enhances the precision and operational feasibility of safety evaluations. The study culminates in a curated QA dataset tailored to illicit activity contexts, accompanied by a standardized assessment protocol, which will be contributed to the JAI-Trust initiative to support robust and consistent safety benchmarking of large language models.
๐ Abstract
In this paper, we discuss question-answer dataset for LLM safety evaluation, with a focus on illegal activities. Specifically, on the basis of manual analysis of AnswerCarefully, we introduce several additional information, methods for creating question-answer examples, and a rubric for evaluating LLM-generated responses. The outcomes of this study are intended to be shared with the "JAI-Trust" project.