WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language model (LLM) training and alignment methodologies rely heavily on Western-centric paradigms, inducing cultural homogenization, while mainstream benchmarks lack rigorous evaluation of cultural inclusivity. Method: We introduce the first benchmark for Global Cultural Inclusivity (GCI), transcending limitations of closed-ended evaluation. Our approach features a novel free-generation assessment framework grounded in Multiplex Worldview, a quantitative metric for cultural polarization degree, and two pluralistic intervention paradigms—contextual embedding and multi-agent collaboration. Contribution/Results: Experiments demonstrate that our method increases viewpoint distribution entropy from a baseline of 13% to 94%, while achieving a 67.7% proportion of positive sentiment. These results substantiate significant improvements in cultural balance and representational capacity across diverse civilizations.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are predominantly trained and aligned in ways that reinforce Western-centric epistemologies and socio-cultural norms, leading to cultural homogenization and limiting their ability to reflect global civilizational plurality. Existing benchmarking frameworks fail to adequately capture this bias, as they rely on rigid, closed-form assessments that overlook the complexity of cultural inclusivity. To address this, we introduce WorldView-Bench, a benchmark designed to evaluate Global Cultural Inclusivity (GCI) in LLMs by analyzing their ability to accommodate diverse worldviews. Our approach is grounded in the Multiplex Worldview proposed by Senturk et al., which distinguishes between Uniplex models, reinforcing cultural homogenization, and Multiplex models, which integrate diverse perspectives. WorldView-Bench measures Cultural Polarization, the exclusion of alternative perspectives, through free-form generative evaluation rather than conventional categorical benchmarks. We implement applied multiplexity through two intervention strategies: (1) Contextually-Implemented Multiplex LLMs, where system prompts embed multiplexity principles, and (2) Multi-Agent System (MAS)-Implemented Multiplex LLMs, where multiple LLM agents representing distinct cultural perspectives collaboratively generate responses. Our results demonstrate a significant increase in Perspectives Distribution Score (PDS) entropy from 13% at baseline to 94% with MAS-Implemented Multiplex LLMs, alongside a shift toward positive sentiment (67.7%) and enhanced cultural balance. These findings highlight the potential of multiplex-aware AI evaluation in mitigating cultural bias in LLMs, paving the way for more inclusive and ethically aligned AI systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluating cultural bias in Large Language Models (LLMs)
Assessing global cultural inclusivity in LLM outputs
Mitigating Western-centric homogenization in AI systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for Global Cultural Inclusivity evaluation
Multiplex Worldview distinguishes cultural homogenization vs diversity
Multi-Agent System boosts cultural perspective diversity
🔎 Similar Papers
No similar papers found.