🤖 AI Summary
Large language models (LLMs) exhibit inconsistent cross-lingual safety performance, yet no systematic framework exists to evaluate or diagnose such disparities. Method: The authors propose the first open-source multilingual safety benchmark, M-ALERT, comprising 75K high-quality prompts across English, French, German, Italian, and Spanish, generated and manually verified using the ALERT taxonomy. Contribution/Results: Empirical evaluation of 10 state-of-the-art models reveals substantial safety performance variance across languages—for instance, one model’s risk rate for “crime_tax” prompts reaches 82% in Italian versus only 3% in English. The study identifies high-risk cross-lingual categories—including drug-related and criminal content—and demonstrates critical blind spots in current multilingual safety mitigation strategies. These findings underscore the necessity of language-granular safety evaluation standards and localized alignment mechanisms to ensure equitable safety guarantees across languages.
📝 Abstract
Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.