🤖 AI Summary
This study addresses the critical governance challenge of cross-cultural value alignment in large language models (LLMs). We develop a multi-level auditing platform and propose MARK—a comprehensive evaluation framework integrating cultural fidelity quantification, first-token probability alignment, and multi-stage interpretable reasoning. Leveraging an ethical-dilemma corpus and a Diversity-Enhanced Framework (DEF), we systematically assess over 20 mainstream LLMs. Results reveal a nonlinear relationship between model scale and alignment quality; Chinese models exhibit stronger multilingual optimization, whereas Western models display pronounced U.S.-centric biases; Mistral-series models outperform LLaMA3-series overall; and full-parameter fine-tuning preserves cultural diversity more effectively than RLHF. Crucially, this work provides the first empirical evidence of shared bottlenecks—across value stability, cultural fidelity, and interpretability—as well as region-specific divergences among globally deployed LLMs, thereby establishing both empirical foundations and methodological guidance for cross-cultural AI governance.
📝 Abstract
As Large Language Models (LLMs) increasingly influence high-stakes decision-making across global contexts, ensuring their alignment with diverse cultural values has become a critical governance challenge. This study presents a Multi-Layered Auditing Platform for Responsible AI that systematically evaluates cross-cultural value alignment in China-origin and Western-origin LLMs through four integrated methodologies: Ethical Dilemma Corpus for assessing temporal stability, Diversity-Enhanced Framework (DEF) for quantifying cultural fidelity, First-Token Probability Alignment for distributional accuracy, and Multi-stAge Reasoning frameworK (MARK) for interpretable decision-making. Our comparative analysis of 20+ leading models, such as Qwen, GPT-4o, Claude, LLaMA, and DeepSeek, reveals universal challenges-fundamental instability in value systems, systematic under-representation of younger demographics, and non-linear relationships between model scale and alignment quality-alongside divergent regional development trajectories. While China-origin models increasingly emphasize multilingual data integration for context-specific optimization, Western models demonstrate greater architectural experimentation but persistent U.S.-centric biases. Neither paradigm achieves robust cross-cultural generalization. We establish that Mistral-series architectures significantly outperform LLaMA3-series in cross-cultural alignment, and that Full-Parameter Fine-Tuning on diverse datasets surpasses Reinforcement Learning from Human Feedback in preserving cultural variation...