LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

📅 2024-12-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Large language models (LLMs) exhibit inconsistent cross-lingual safety performance, yet no systematic framework exists to evaluate or diagnose such disparities. Method: The authors propose the first open-source multilingual safety benchmark, M-ALERT, comprising 75K high-quality prompts across English, French, German, Italian, and Spanish, generated and manually verified using the ALERT taxonomy. Contribution/Results: Empirical evaluation of 10 state-of-the-art models reveals substantial safety performance variance across languages—for instance, one model’s risk rate for “crime_tax” prompts reaches 82% in Italian versus only 3% in English. The study identifies high-risk cross-lingual categories—including drug-related and criminal content—and demonstrates critical blind spots in current multilingual safety mitigation strategies. These findings underscore the necessity of language-granular safety evaluation standards and localized alignment mechanisms to ensure equitable safety guarantees across languages.

Technology Category

Application Category

📝 Abstract

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLM safety gaps across multiple languages

Evaluating inconsistent safety performance in different languages

Identifying consistently unsafe response categories in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

M-ALERT benchmark evaluates multilingual LLM safety

15k prompts per language for safety analysis

Reveals language-specific safety inconsistencies in LLMs

🔎 Similar Papers

No similar papers found.