Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
This work addresses the implicit cultural bias in moral judgments exhibited by large language models and the limitations of existing alignment methods, which rely on fine-tuning or internal model access and thus cannot be applied to black-box API settings. The authors propose DISCA, a training-free, inference-time calibration method that leverages cross-national value divergences—rather than consensus—from the World Values Survey to construct a panel of persona-based agents. By incorporating a loss-aversion mechanism, DISCA translates these value disagreements into bounded logit adjustments, achieving cultural alignment without modifying model weights. Evaluated across 20 countries and seven open-source models (ranging from 2B to 70B parameters), DISCA reduces MultiTP cultural misalignment by 10–24% in models of at least 3.8B parameters and by 2–7% in open-ended scenarios, using only publicly available data and black-box model access.
📝 Abstract
Large language models increasingly mediate decisions that turn on moral judgement, yet a growing body of evidence shows that their implicit preferences are not culturally neutral. Existing cultural alignment methods either require per-country preference data and fine-tuning budgets or assume white-box access to model internals that commercial APIs do not expose. In this work, we focus on this realistic black-box, public-data-only regime and observe that within-country sociodemographic disagreement, not consensus, is the primary steering signal. We introduce DISCA (Disagreement-Informed Steering for Cultural Alignment), an inference-time method that instantiates each country as a panel of World-Values-Survey-grounded persona agents and converts their disagreement into a bounded, loss-averse logit correction. Across 20 countries and 7 open-weight backbones (2B--70B), DISCA reduces cultural misalignment on MultiTP by 10--24% on the six backbones >=3.8B, and 2--7% on open-ended scenarios, without changing any weights. Our results suggest that inference-time calibration is a scalable alternative to fine-tuning for serving the long tail of global moral preferences.
Problem

Research questions and friction points this paper is trying to address.

cultural alignment
large language models
black-box setting
moral preferences
cross-cultural disagreement
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free alignment
persona disagreement
inference-time calibration
cultural alignment
black-box LLM
H
Huynh Trung Kiet
Faculty of Information and Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
D
Dao Sy Duy Minh
Faculty of Information and Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
Tuan Nguyen
Tuan Nguyen
Department of Infrastructure Engineering, University of Melbourne
Computational MechanicsStructural EngineeringConcrete TechnologyMachine Learning
C
Chi-Nguyen Tran
Faculty of Information and Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
P
Phu-Hoa Pham
Faculty of Information and Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
N
Nguyen Lam Phu Quy
Faculty of Information and Technology, University of Science, Vietnam National University, Ho Chi Minh City, Vietnam
The Anh Han
The Anh Han
Professor of Computer Science, Teesside University
Evolutionary Game TheoryArtificial IntelligenceEvolution of CooperationMulti-agent Systems
Long Tran-Thanh
Long Tran-Thanh
Professor in Computer Science, University of Warwick
Artificial IntelligenceAI for social goodgame theoryhuman-agent learningmulti-armed bandits