A Study on Question-Answer Dataset for LLM Safety Evaluation with a Focus on Illegal Activities

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Current safety evaluations of large language models in illicit activity scenarios lack high-quality, structured question-answering data. To address this gap, this work proposes a fine-grained methodology for constructing illicit activity–oriented QA pairs, grounded in manual content analysis of the AnswerCarefully dataset, and introduces a structured evaluation rubric for assessing safety-aligned model responses. This approach substantially enhances the precision and operational feasibility of safety evaluations. The study culminates in a curated QA dataset tailored to illicit activity contexts, accompanied by a standardized assessment protocol, which will be contributed to the JAI-Trust initiative to support robust and consistent safety benchmarking of large language models.

📝 Abstract

In this paper, we discuss question-answer dataset for LLM safety evaluation, with a focus on illegal activities. Specifically, on the basis of manual analysis of AnswerCarefully, we introduce several additional information, methods for creating question-answer examples, and a rubric for evaluating LLM-generated responses. The outcomes of this study are intended to be shared with the "JAI-Trust" project.

Problem

Research questions and friction points this paper is trying to address.

LLM safety evaluation

question-answer dataset

illegal activities

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM safety evaluation

illegal activities

question-answer dataset

response rubric

AnswerCarefully