LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models

๐Ÿ“… 2025-08-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current multilingual safety evaluation of large language models (LLMs) suffers from benchmark scarcity, narrow language coverage, and insufficient data diversity, hindering cross-lingual safety alignment research. To address this, we introduce ML-SafetyBenchโ€”the first open multilingual safety benchmark covering 12 languages (including 8 low-resource ones) and 45,000 annotated samples. Our methodology employs a fine-grained, multi-dimensional evaluation framework that integrates machine translation, transcreation, and natively authored content to ensure linguistic authenticity and cultural appropriateness. We propose a unified safety prompting template and classification-based metrics to assess direct/indirect safety responses and over-sensitivity. Empirical analysis reveals significant cross-lingual and cross-domain trade-offs between safety and helpfulness. All benchmark data, implementation code, and evaluation tools are publicly released.

Technology Category

Application Category

๐Ÿ“ Abstract
The widespread adoption and increasing prominence of large language models (LLMs) in global technologies necessitate a rigorous focus on ensuring their safety across a diverse range of linguistic and cultural contexts. The lack of a comprehensive evaluation and diverse data in existing multilingual safety evaluations for LLMs limits their effectiveness, hindering the development of robust multilingual safety alignment. To address this critical gap, we introduce LinguaSafe, a comprehensive multilingual safety benchmark crafted with meticulous attention to linguistic authenticity. The LinguaSafe dataset comprises 45k entries in 12 languages, ranging from Hungarian to Malay. Curated using a combination of translated, transcreated, and natively-sourced data, our dataset addresses the critical need for multilingual safety evaluations of LLMs, filling the void in the safety evaluation of LLMs across diverse under-represented languages from Hungarian to Malay. LinguaSafe presents a multidimensional and fine-grained evaluation framework, with direct and indirect safety assessments, including further evaluations for oversensitivity. The results of safety and helpfulness evaluations vary significantly across different domains and different languages, even in languages with similar resource levels. Our benchmark provides a comprehensive suite of metrics for in-depth safety evaluation, underscoring the critical importance of thoroughly assessing multilingual safety in LLMs to achieve more balanced safety alignment. Our dataset and code are released to the public to facilitate further research in the field of multilingual LLM safety.
Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive multilingual safety evaluation for LLMs
Need for diverse linguistic data in safety assessments
Underrepresentation of certain languages in current benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual safety benchmark with 45k entries
Combines translated, transcreated, native-sourced data
Multidimensional framework for direct, indirect safety
๐Ÿ”Ž Similar Papers
No similar papers found.
Zhiyuan Ning
Zhiyuan Ning
Westlake University
Graph Machine LearningKnowledge GraphsLarge Language Models
Tianle Gu
Tianle Gu
Tsinghua University
(M)LLM SafetyPEFT
Jiaxin Song
Jiaxin Song
University of Illinois, Urbana-Champaign
Algorithmic game theoryProgramming languages
S
Shixin Hong
Shanghai Artificial Intelligence Laboratory, Tsinghua University
Lingyu Li
Lingyu Li
Shanghai Jiao Tong University
Active inferenceArtificial Intelligencephilosophy
H
Huacan Liu
Shanghai Artificial Intelligence Laboratory, Shanghai Jiao Tong University
J
Jie Li
Shanghai Artificial Intelligence Laboratory
Y
Yixu Wang
Shanghai Artificial Intelligence Laboratory, Fudan University
M
Meng Lingyu
Shanghai Artificial Intelligence Laboratory
Y
Yan Teng
Shanghai Artificial Intelligence Laboratory
Y
Yingchun Wang
Shanghai Artificial Intelligence Laboratory