🤖 AI Summary
This work addresses the limitations of existing AI safety benchmarks, which are predominantly English-centric and fail to account for cultural misalignment and cross-lingual safety failures in low-resource African languages. To bridge this gap, the authors introduce UbuntuGuard—the first dynamic, culturally grounded safety benchmark developed in collaboration with 155 African domain experts. It comprises adversarial queries authored by these experts, along with context-sensitive safety policies and reference responses. Moving beyond static categorical constraints, UbuntuGuard enables runtime-executable, multilingual safety evaluations. Empirical assessments across 13 prominent language models reveal that English-centric benchmarks substantially overestimate multilingual safety, exhibit inadequate cross-lingual transfer coverage, and even state-of-the-art dynamic models struggle to adequately localize safety mechanisms within African sociocultural contexts.
📝 Abstract
Current guardian models are predominantly Western-centric and optimized for high-resource languages, leaving low-resource African languages vulnerable to evolving harms, cross-lingual safety failures, and cultural misalignment. Moreover, most guardian models rely on rigid, predefined safety categories that fail to generalize across diverse linguistic and sociocultural contexts. Robust safety, therefore, requires flexible, runtime-enforceable policies and benchmarks that reflect local norms, harm scenarios, and cultural expectations. We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts across sensitive fields, including healthcare. From these expert-crafted queries, we derive context-specific safety policies and reference responses that capture culturally grounded risk signals, enabling policy-aligned evaluation of guardian models. We evaluate 13 models, comprising six general-purpose LLMs and seven guardian models across three distinct variants: static, dynamic, and multilingual. Our findings reveal that existing English-centric benchmarks overestimate real-world multilingual safety, cross-lingual transfer provides partial but insufficient coverage, and dynamic models, while better equipped to leverage policies at inference time, still struggle to fully localize African-language contexts. These findings highlight the urgent need for multilingual, culturally grounded safety benchmarks to enable the development of reliable and equitable guardian models for low-resource languages. Our code can be found online.\footnote{Code repository available at https://github.com/hemhemoh/UbuntuGuard.