XL-SafetyBench: A Country-Grounded Cross-Cultural Benchmark for LLM Safety and Cultural Sensitivity

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This study addresses the limitations of existing safety evaluation benchmarks for large language models, which are predominantly English-centric and insufficient for assessing cultural sensitivity and localized harms. The authors construct a cross-cultural safety benchmark encompassing 10 country–language pairs and 5,500 test cases, introducing two novel metrics—Neutral-Safe Rate and Cultural Sensitivity Rate—to distinguish between universal harms and culturally embedded sensitive content. Through a multi-stage construction pipeline involving model-assisted discovery, automated validation, and dual-native annotation, along with a unified evaluation framework, they assess 10 frontier models and 27 localized models. The evaluation reveals a decoupling between jailbreak robustness and cultural awareness in frontier models and demonstrates that the apparent safety of many localized models often stems from generation failures rather than genuine alignment.
📝 Abstract
Current LLM safety benchmarks are predominantly English-centric and often rely on translation, failing to capture country-specific harms. Moreover, they rarely evaluate a model's ability to detect culturally embedded sensitivities as distinct from universal harms. We introduce XL-SafetyBench. a suite of 5,500 test cases across 10 country-language pairs, comprising a Jailbreak Benchmark of country-grounded adversarial prompts and a Cultural Benchmark where local sensitivities are embedded within innocuous requests. Each item is constructed via a multi-stage pipeline that combines LLM-assisted discovery, automated validation gates, and dual independent native-speaker annotators per country. To distinguish principled refusal from comprehension failure, we evaluate Attack Success Rate (ASR) alongside two complementary metrics we introduce: Neutral-Safe Rate (NSR) and Cultural Sensitivity Rate (CSR). Evaluating 10 frontier and 27 local LLMs reveals two key findings. First, jailbreak robustness and cultural awareness do not show a coupled relationship among frontier models, so a composite safety score obscures per-axis variation. Second, local models exhibit a near-linear ASR-NSR trade-off (r = -0.81), indicating that their apparent safety reflects generation failure rather than genuine alignment. XL-SafetyBench enables more nuanced, cross-cultural safety evaluation in the multilingual era.
Problem

Research questions and friction points this paper is trying to address.

LLM safety
cross-cultural benchmark
cultural sensitivity
country-specific harms
multilingual evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-cultural benchmark
cultural sensitivity
LLM safety evaluation
adversarial prompts
multilingual alignment
🔎 Similar Papers
No similar papers found.
Dasol Choi
Dasol Choi
Yonsei University
Responsible AIMachine UnlearningAI safety
E
Eugenia Kim
Microsoft
J
Jaewon Noh
Korea AISI
S
Sang Seo
Korea AISI
E
Eunmi Kim
KT Corporation
M
Myunggyo Oh
KT Corporation
Y
Yunjin Park
KT Corporation
B
Brigitta Jesica Kartono
BMW Group
J
Josef Pichlmeier
BMW Group
H
Helena Berndt
BMW Group
S
Sai Krishna Mendu
Coinbase
G
Glenn Johannes Tungka
Technical University of Munich
Ö
Özlem Gökçe
Ankara University
S
Suresh Gehlot
Cyril Amarchand Mangaldas
K
Katherine Pratt
Microsoft
A
Amanda Minnich
Microsoft
Haon Park
Haon Park
Computer Science Student, Seoul National University
Machine LearningDeep LearningImage AugmentationRobotics