Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Current evaluations of cultural value alignment in large language models predominantly rely on discriminative or multiple-choice formats, which struggle to capture authentic value orientations in open-ended generation and often overlook subcultural diversity. To address these limitations, this work proposes DOVE, the first distributed, open-ended evaluation framework. DOVE employs rate-distortion variational optimization to construct a structured value codebook, mapping text into a semantically denoised value space, and leverages unbalanced optimal transport to measure the alignment between human and model-generated value distributions. Experiments across twelve large language models demonstrate that DOVE achieves high reliability with only 500 samples per culture and exhibits a downstream task correlation of 31.56%, significantly outperforming existing methods.

Technology Category

Application Category

📝 Abstract

As LLMs are globally deployed, aligning their cultural value orientations is critical for safety and user engagement. However, existing benchmarks face the Construct-Composition-Context ($C^3$) challenge: relying on discriminative, multiple-choice formats that probe value knowledge rather than true orientations, overlook subcultural heterogeneity, and mismatch with real-world open-ended generation. We introduce DOVE, a distributional evaluation framework that directly compares human-written text distributions with LLM-generated outputs. DOVE utilizes a rate-distortion variational optimization objective to construct a compact value-codebook from 10K documents, mapping text into a structured value space to filter semantic noise. Alignment is measured using unbalanced optimal transport, capturing intra-cultural distributional structures and sub-group diversity. Experiments across 12 LLMs show that DOVE achieves superior predictive validity, attaining a 31.56% correlation with downstream tasks, while maintaining high reliability with as few as 500 samples per culture.

Problem

Research questions and friction points this paper is trying to address.

value alignment

large language models

cultural heterogeneity

open-ended generation

evaluation benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

distributional evaluation

value codebook

open-ended generation