🤖 AI Summary
Current evaluations of cultural value alignment in large language models predominantly rely on discriminative or multiple-choice formats, which struggle to capture authentic value orientations in open-ended generation and often overlook subcultural diversity. To address these limitations, this work proposes DOVE, the first distributed, open-ended evaluation framework. DOVE employs rate-distortion variational optimization to construct a structured value codebook, mapping text into a semantically denoised value space, and leverages unbalanced optimal transport to measure the alignment between human and model-generated value distributions. Experiments across twelve large language models demonstrate that DOVE achieves high reliability with only 500 samples per culture and exhibits a downstream task correlation of 31.56%, significantly outperforming existing methods.
📝 Abstract
As LLMs are globally deployed, aligning their cultural value orientations is critical for safety and user engagement. However, existing benchmarks face the Construct-Composition-Context ($C^3$) challenge: relying on discriminative, multiple-choice formats that probe value knowledge rather than true orientations, overlook subcultural heterogeneity, and mismatch with real-world open-ended generation. We introduce DOVE, a distributional evaluation framework that directly compares human-written text distributions with LLM-generated outputs. DOVE utilizes a rate-distortion variational optimization objective to construct a compact value-codebook from 10K documents, mapping text into a structured value space to filter semantic noise. Alignment is measured using unbalanced optimal transport, capturing intra-cultural distributional structures and sub-group diversity. Experiments across 12 LLMs show that DOVE achieves superior predictive validity, attaining a 31.56% correlation with downstream tasks, while maintaining high reliability with as few as 500 samples per culture.