Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

๐Ÿ“… 2026-02-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limited ability of current large language models to recognize deep-seated values in cross-lingual content safety evaluation and their lack of systematic assessment across diverse global value systems. To bridge this gap, we introduce X-Value, the first large-scale cross-lingual benchmark for value alignment evaluation, comprising over 5,000 question-answer pairs across 18 languages. Grounded in Schwartzโ€™s theory of basic human values, X-Value organizes value dimensions into seven core categories and features a novel two-stage human annotation framework that integrates both global consensus and cultural pluralism. Experimental results reveal that state-of-the-art models achieve less than 77% accuracy on X-Value, with cross-lingual performance gaps exceeding 20%, highlighting significant limitations in their capacity to understand values across cultures.

Technology Category

Application Category

๐Ÿ“ Abstract
While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To bridge this gap, we introduce X-Value, a novel Cross-lingual Values Assessment Benchmark designed to evaluate LLMs' ability to assess deep-level values of content from a global perspective. X-Value consists of more than 5,000 QA pairs across 18 languages, systematically organized into 7 core domains grounded in Schwartz's Theory of Basic Human Values and categorized into easy and hard levels for discriminative evaluation. We further propose a unique two-stage annotation framework that first identifies whether an issue falls under global consensus (e.g., human rights) or pluralism (e.g., religion), and subsequently conducts a multi-party evaluation of the latent values embedded within the content. Systematic evaluations on X-Value reveal that current SOTA LLMs exhibit deficiencies in cross-lingual values assessment ($Acc < 77\%$), with significant performance disparities across different languages ($ฮ”Acc > 20\%$). This work highlights the urgent need to improve the nuanced, values-aware content assessment capability of LLMs. Our X-Value is available at: https://huggingface.co/datasets/Whitolf/X-Value.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual
values assessment
large language models
content safety
value dimensions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Values Assessment
Consensus-Pluralism Framework
X-Value Benchmark
Latent Value Evaluation
Multilingual LLM Evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.