Scaling Cultural Resources for Improving Generative Models

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Generative models exhibit significant performance degradation in multilingual and cross-cultural settings, primarily due to the scarcity of high-quality, culturally salient, and globally representative multilingual data resources. Method: We propose a reproducible and scalable multi-path data construction framework that integrates multilingual web harvesting, automated cultural salience filtering, and community-driven data contribution. This framework systematically expands culturally aware training and evaluation datasets. Technically, we design novel cultural bias detection metrics and a multidimensional evaluation protocol—enabling, for the first time, quantitative assessment of generative models’ cross-cultural applicability. Results: Experiments demonstrate that our framework effectively uncovers cultural biases across critical dimensions—including value expression, regional commonsense knowledge, and social norms. It establishes an extensible data infrastructure and standardized benchmark for fairness-aware model optimization and global deployment.

Technology Category

Application Category

📝 Abstract

Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been commonly conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a repeatable, scalable, multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps.

Problem

Research questions and friction points this paper is trying to address.

Improving generative models' performance across global cultures

Addressing reduced model efficacy in diverse cultural contexts

Developing scalable data pipelines for cross-cultural AI evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed scalable pipeline for cultural data collection

Collected multilingual culturally salient data resources

Assessed and improved model cross-cultural applicability gaps

🔎 Similar Papers

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning