Mitigating Cross-Lingual Cultural Inconsistencies in LLMs via Consensus-Driven Preference Optimisation

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the issue that multilingual large language models often generate culturally inconsistent outputs when prompted in different languages, even with a fixed user persona—such as assigning distinct national cultural identities to the same British character. To mitigate this, the authors propose C-3PO, a consensus-driven preference optimization framework that enforces cross-lingual cultural consistency, enabling models to faithfully preserve specified personas while remaining culturally neutral across languages. They introduce Singleton Fleiss’s κ_S, a novel metric to quantify cross-lingual cultural inconsistency, and integrate preference optimization, representation intervention, and intermediate-layer decoding during training. Experiments demonstrate that C-3PO improves κ_S by up to 0.13 over strong baselines, with particularly pronounced gains in low-resource languages such as Indonesian and Persian.
📝 Abstract
Despite their impressive capabilities, multilingual large language models (MLLMs) frequently exhibit inconsistent behaviour when the prompt's language changes. While such adaptation is generally desirable, it becomes a critical failure when a user's identity is explicitly defined. For instance, given a fixed British persona and an ambiguous everyday knowledge query about literature, the prompt's language frequently overwrites the system persona -- yielding Shakespeare in English but Cervantes in Spanish. To robustly quantify this Cross-lingual Cultural Inconsistency, we introduce Singleton Fleiss's $\kappa_S$, a metric mathematically resilient to hallucinations. For mitigation, we propose Cross-lingual Cultural Consistent Preference Optimisation (C-3PO), a consensus-driven alignment framework. C-3PO achieves up to a 0.10-point absolute increase in $\kappa_S$ over unaligned models, outperforming strong prompting and representation steering baselines. Empirical evaluations show this inconsistency disproportionately affects lower-resource languages like Indonesian and Persian. A layer-wise interpretability analysis reveals the underlying mechanism: by early-decoding intermediate layer representations, we find that MLLMs implicitly personalise outputs towards the prompt language's stereotypical culture as forward-pass representations stabilise.
Problem

Research questions and friction points this paper is trying to address.

Cross-lingual Cultural Inconsistency
Multilingual Large Language Models
User Identity Preservation
Cultural Bias
Language-dependent Behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Cultural Inconsistency
Preference Optimisation
Consensus-driven Alignment
Singleton Fleiss's κ_S
Multilingual LLMs