AdaCultureSafe: Adaptive Cultural Safety Grounded by Cultural Knowledge in Large Language Models

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study addresses the prevalent disconnect in large language models between cultural safety and cultural knowledge, which often leads to responses that fail to respect specific cultural contexts. To bridge this gap, the authors propose the first unified modeling framework that jointly integrates both aspects, supported by AdaCultureSafe—a large-scale paired dataset comprising 4.8K fine-grained cultural descriptions and 48K human-validated queries—constructed through a pipeline combining authoritative knowledge integration, automated query generation, and rigorous human validation. They further introduce a knowledge-guided response generation method that explicitly incorporates cultural knowledge into the decoding process, informed by neuron activation analysis. Experimental results demonstrate significant improvements in cultural safety, while also revealing a lack of strong correlation between cultural safety and cultural knowledge capabilities in mainstream models, thereby offering a new direction for alignment training.

Technology Category

Application Category

📝 Abstract

With the widespread adoption of Large Language Models (LLMs), respecting indigenous cultures becomes essential for models'culturally safety and responsible global applications. Existing studies separately consider cultural safety and cultural knowledge and neglect that the former should be grounded by the latter. This severely prevents LLMs from yielding culture-specific respectful responses. Consequently, adaptive cultural safety remains a formidable task. In this work, we propose to jointly model cultural safety and knowledge. First and foremost, cultural-safety and knowledge-paired data serve as the key prerequisite to conduct this research. However, the cultural diversity across regions and the subtlety of cultural differences pose significant challenges to the creation of such paired evaluation data. To address this issue, we propose a novel framework that integrates authoritative cultural knowledge descriptions curation, LLM-automated query generation, and heavy manual verification. Accordingly, we obtain a dataset named AdaCultureSafe containing 4.8K manually decomposed fine-grained cultural descriptions and the corresponding 48K manually verified safety- and knowledge-oriented queries. Upon the constructed dataset, we evaluate three families of popular LLMs on their cultural safety and knowledge proficiency, via which we make a critical discovery: no significant correlation exists between their cultural safety and knowledge proficiency. We then delve into the utility-related neuron activations within LLMs to investigate the potential cause of the absence of correlation, which can be attributed to the difference of the objectives of pre-training and post-alignment. We finally present a knowledge-grounded method, which significantly enhances cultural safety by enforcing the integration of knowledge into the LLM response generation process.

Problem

Research questions and friction points this paper is trying to address.

cultural safety

cultural knowledge

large language models

adaptive cultural safety

indigenous cultures

Innovation

Methods, ideas, or system contributions that make the work stand out.

cultural safety

cultural knowledge

knowledge-grounded generation