Cross-Lingual Transfer of Cultural Knowledge: An Asymmetric Phenomenon

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies a systematic asymmetry in multilingual cultural knowledge transfer among large language models (LLMs): bidirectional transfer occurs between English and high-resource non-English languages, whereas low-resource languages exhibit only unidirectional transfer toward English. We formally define this asymmetry and propose a frequency-driven, interpretable hypothesis—positing that word-frequency distributions in pretraining corpora govern cross-lingual cultural knowledge migration. To test it, we develop a transparent, controllable cultural transfer analysis framework grounded in training-data provenance. Using cross-lingual knowledge probing, corpus-level frequency statistics, and a multilingual benchmark covering four non-Anglo cultural domains, we empirically validate the hypothesis. Quantitative evaluation across 12 language pairs reveals statistically significant asymmetry in seven pairs. Our work establishes a theoretical foundation for culturally equitable multilingual model design and delivers an interpretable diagnostic toolkit for analyzing cultural knowledge transfer.

Technology Category

Application Category

📝 Abstract
Despite substantial research efforts evaluating how well large language models~(LLMs) handle global cultural diversity, the mechanisms behind their cultural knowledge acquisition, particularly in multilingual settings, remain unclear. We study this question by investigating how cultural knowledge transfers across languages during language adaptation of LLMs. We introduce an interpretable framework for studying this transfer, ensuring training data transparency and controlling transfer effects. Through a study of four non-Anglophonic cultures, we observe bidirectional cultural transfer between English and other high-resource languages, while low-resource languages primarily transfer knowledge to English with limited reverse flow. To explain this asymmetric phenomenon, we propose a frequency-based hypothesis: cultural knowledge appearing more frequently in the pretraining data transfers more easily, which is supported by empirical analysis of the training corpora.
Problem

Research questions and friction points this paper is trying to address.

Mechanisms of cultural knowledge transfer in multilingual LLMs unclear
Asymmetric cultural transfer between high and low-resource languages
Frequency-based hypothesis explains uneven cultural knowledge distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interpretable framework for cultural transfer study
Bidirectional transfer in high-resource languages
Frequency-based hypothesis for asymmetric transfer
🔎 Similar Papers
No similar papers found.
C
Chen Zhang
Wangxuan Institute of Computer Technology, Peking University
Z
Zhiyuan Liao
Wangxuan Institute of Computer Technology, Peking University
Yansong Feng
Yansong Feng
Peking University
Natural Language ProcessingPattern Recognition