Towards Cross-lingual Values Assessment: A Consensus-Pluralism Perspective

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the limited ability of current large language models to recognize deep-seated values in cross-lingual content safety evaluation and their lack of systematic assessment across diverse global value systems. To bridge this gap, we introduce X-Value, the first large-scale cross-lingual benchmark for value alignment evaluation, comprising over 5,000 question-answer pairs across 18 languages. Grounded in Schwartz’s theory of basic human values, X-Value organizes value dimensions into seven core categories and features a novel two-stage human annotation framework that integrates both global consensus and cultural pluralism. Experimental results reveal that state-of-the-art models achieve less than 77% accuracy on X-Value, with cross-lingual performance gaps exceeding 20%, highlighting significant limitations in their capacity to understand values across cultures.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) have become pivotal to content safety, current evaluation paradigms primarily focus on detecting explicit harms (e.g., violence or hate speech), neglecting the subtler value dimensions conveyed in digital content. To bridge this gap, we introduce X-Value, a novel Cross-lingual Values Assessment Benchmark designed to evaluate LLMs' ability to assess deep-level values of content from a global perspective. X-Value consists of more than 5,000 QA pairs across 18 languages, systematically organized into 7 core domains grounded in Schwartz's Theory of Basic Human Values and categorized into easy and hard levels for discriminative evaluation. We further propose a unique two-stage annotation framework that first identifies whether an issue falls under global consensus (e.g., human rights) or pluralism (e.g., religion), and subsequently conducts a multi-party evaluation of the latent values embedded within the content. Systematic evaluations on X-Value reveal that current SOTA LLMs exhibit deficiencies in cross-lingual values assessment ($Acc < 77\%$), with significant performance disparities across different languages ($ΔAcc > 20\%$). This work highlights the urgent need to improve the nuanced, values-aware content assessment capability of LLMs. Our X-Value is available at: https://huggingface.co/datasets/Whitolf/X-Value.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual

values assessment

large language models

content safety

value dimensions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Values Assessment

Consensus-Pluralism Framework

X-Value Benchmark

Latent Value Evaluation

Multilingual LLM Evaluation

🔎 Similar Papers

No similar papers found.