SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Current AI safety evaluations heavily rely on English resources and machine translation, overlooking the scarcity of linguistic resources and cultural specificity in Southeast Asia (SEA), thereby failing to capture regionally sensitive issues—such as culturally embedded political discourse and localized disinformation. Method: We introduce the first human-authored, localization-aware multilingual safety benchmark for SEA, covering eight languages and comprising three manually verified subsets: general safety, real-world scenarios, and content generation. Contribution/Results: This benchmark transcends the limitations of translated data by systematically characterizing cross-cultural safety risks for the first time. Empirical evaluation reveals that mainstream large language models and safety mitigations exhibit significantly degraded performance on SEA-language tasks compared to English, underscoring the critical challenge of cultural adaptation for AI safety alignment.

Technology Category

Application Category

📝 Abstract

Safeguard models help large language models (LLMs) detect and block harmful content, but most evaluations remain English-centric and overlook linguistic and cultural diversity. Existing multilingual safety benchmarks often rely on machine-translated English data, which fails to capture nuances in low-resource languages. Southeast Asian (SEA) languages are underrepresented despite the region's linguistic diversity and unique safety concerns, from culturally sensitive political speech to region-specific misinformation. Addressing these gaps requires benchmarks that are natively authored to reflect local norms and harm scenarios. We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA, covering eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation. The experimental results from our benchmark demonstrate that even state-of-the-art LLMs and guardrails are challenged by SEA cultural and harm scenarios and underperform when compared to English texts.

Problem

Research questions and friction points this paper is trying to address.

Evaluates AI safety in underrepresented Southeast Asian languages and cultures.

Addresses gaps in multilingual safety benchmarks using native data.

Tests LLMs and guardrails on culturally specific harm scenarios.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-verified safety benchmark for Southeast Asian languages

Native authored samples reflecting local cultural norms

Covers eight languages across three diverse subsets

🔎 Similar Papers

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages