Scaling Cultural Resources for Improving Generative Models

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative models exhibit significant performance degradation in multilingual and cross-cultural settings, primarily due to the scarcity of high-quality, culturally salient, and globally representative multilingual data resources. Method: We propose a reproducible and scalable multi-path data construction framework that integrates multilingual web harvesting, automated cultural salience filtering, and community-driven data contribution. This framework systematically expands culturally aware training and evaluation datasets. Technically, we design novel cultural bias detection metrics and a multidimensional evaluation protocol—enabling, for the first time, quantitative assessment of generative models’ cross-cultural applicability. Results: Experiments demonstrate that our framework effectively uncovers cultural biases across critical dimensions—including value expression, regional commonsense knowledge, and social norms. It establishes an extensible data infrastructure and standardized benchmark for fairness-aware model optimization and global deployment.

Technology Category

Application Category

📝 Abstract
Generative models are known to have reduced performance in different global cultural contexts and languages. While continual data updates have been commonly conducted to improve overall model performance, bolstering and evaluating this cross-cultural competence of generative AI models requires data resources to be intentionally expanded to include global contexts and languages. In this work, we construct a repeatable, scalable, multi-pronged pipeline to collect and contribute culturally salient, multilingual data. We posit that such data can assess the state of the global applicability of our models and thus, in turn, help identify and improve upon cross-cultural gaps.
Problem

Research questions and friction points this paper is trying to address.

Improving generative models' performance across global cultures
Addressing reduced model efficacy in diverse cultural contexts
Developing scalable data pipelines for cross-cultural AI evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructed scalable pipeline for cultural data collection
Collected multilingual culturally salient data resources
Assessed and improved model cross-cultural applicability gaps
🔎 Similar Papers
No similar papers found.