A Study on Question-Answer Dataset for LLM Safety Evaluation with a Focus on Illegal Activities

๐Ÿ“… 2026-05-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current safety evaluations of large language models in illicit activity scenarios lack high-quality, structured question-answering data. To address this gap, this work proposes a fine-grained methodology for constructing illicit activityโ€“oriented QA pairs, grounded in manual content analysis of the AnswerCarefully dataset, and introduces a structured evaluation rubric for assessing safety-aligned model responses. This approach substantially enhances the precision and operational feasibility of safety evaluations. The study culminates in a curated QA dataset tailored to illicit activity contexts, accompanied by a standardized assessment protocol, which will be contributed to the JAI-Trust initiative to support robust and consistent safety benchmarking of large language models.
๐Ÿ“ Abstract
In this paper, we discuss question-answer dataset for LLM safety evaluation, with a focus on illegal activities. Specifically, on the basis of manual analysis of AnswerCarefully, we introduce several additional information, methods for creating question-answer examples, and a rubric for evaluating LLM-generated responses. The outcomes of this study are intended to be shared with the "JAI-Trust" project.
Problem

Research questions and friction points this paper is trying to address.

LLM safety evaluation
question-answer dataset
illegal activities
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM safety evaluation
illegal activities
question-answer dataset
response rubric
AnswerCarefully
๐Ÿ”Ž Similar Papers
No similar papers found.