AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative AI safety evaluation datasets are predominantly English-centric, lacking coverage of non-English sociocultural contexts (e.g., Korea) and multimodal risks. Method: We introduce AssurAI—the first high-quality, Korea-specific multimodal safety evaluation dataset—spanning text, image, video, and audio modalities, with a fine-grained taxonomy of 35 risk categories. It employs a two-stage expert-led–crowdsourced construction paradigm, integrating triple independent annotation and iterative expert red-teaming to ensure data authority and robustness. Contribution/Results: We publicly release a dataset of 11,480 samples and empirically validate its effectiveness in assessing the safety of leading large language models. AssurAI bridges a critical gap in non-English, multimodal AI safety benchmarking, enabling culturally grounded, cross-modal risk evaluation for generative AI systems.

Technology Category

Application Category

📝 Abstract
The rapid evolution of generative AI necessitates robust safety evaluations. However, current safety datasets are predominantly English-centric, failing to capture specific risks in non-English, socio-cultural contexts such as Korean, and are often limited to the text modality. To address this gap, we introduce AssurAI, a new quality-controlled Korean multimodal dataset for evaluating the safety of generative AI. First, we define a taxonomy of 35 distinct AI risk factors, adapted from established frameworks by a multidisciplinary expert group to cover both universal harms and relevance to the Korean socio-cultural context. Second, leveraging this taxonomy, we construct and release AssurAI, a large-scale Korean multimodal dataset comprising 11,480 instances across text, image, video, and audio. Third, we apply the rigorous quality control process used to ensure data integrity, featuring a two-phase construction (i.e., expert-led seeding and crowdsourced scaling), triple independent annotation, and an iterative expert red-teaming loop. Our pilot study validates AssurAI's effectiveness in assessing the safety of recent LLMs. We release AssurAI to the public to facilitate the development of safer and more reliable generative AI systems for the Korean community.
Problem

Research questions and friction points this paper is trying to address.

Addressing English-centric bias in AI safety evaluations for Korean contexts
Developing multimodal dataset covering text/image/video/audio safety risks
Establishing Korean socio-cultural risk taxonomy through expert collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Korean multimodal dataset for AI safety evaluation
Taxonomy of 35 AI risk factors covering Korean context
Rigorous quality control with expert-crowdsource hybrid construction
🔎 Similar Papers
No similar papers found.
Chae-Gyun Lim
Chae-Gyun Lim
KAIST
S
Seung-Ho Han
KAIST
E
EunYoung Byun
TTA
J
Jeongyun Han
University of Seoul
S
Soohyun Cho
Keimyung University
E
Eojin Joo
KAIST
Heehyeon Kim
Heehyeon Kim
PhD Student, KAIST
Graph Machine LearningFraud Detection
S
Sieun Kim
KAIST
Juhoon Lee
Juhoon Lee
KAIST
Human-Computer InteractionSocial ComputingTrust and Safety
H
Hyunsoo Lee
KAIST
D
Dongkun Lee
KAIST
J
Jonghwan Hyeon
KAIST
Y
Yechan Hwang
KAIST
Young-Jun Lee
Young-Jun Lee
Ph.D. at KAIST
Natural Language ProcessingConversational AIMulti-Modal Learning
K
Kyeongryul Lee
KAIST
M
Minhyeong An
KAIST
H
Hyunjun Ahn
KAIST
J
Jeongwoo Son
KAIST
J
Junho Park
KAIST
D
Donggyu Yoon
KAIST
T
Taehyung Kim
KAIST
J
Jeemin Kim
KAIST
Dasom Choi
Dasom Choi
KAIST
Human-computer Interaction (HCI)
K
Kwangyoung Lee
KAIST
Hyunseung Lim
Hyunseung Lim
KAIST
Human-AI InteractionHuman-Computer Interaction