LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Pretraining contamination undermines the evaluation of cross-lingual knowledge transfer in multilingual large language models (LLMs). Method: We propose a time-sensitive, automated evaluation framework that mines entity facts relative to temporal knowledge cutoff points, aligns cross-lingual documents, and automatically generates questions—yielding a rigorously validated multilingual benchmark with strict knowledge-cutoff enforcement to isolate true cross-lingual transfer from pretraining exposure. Contribution/Results: Our framework enables the first precise measurement of genuine cross-lingual knowledge transfer, uncovering migration asymmetry induced by linguistic distance and diminishing marginal returns with increasing model scale. Evaluated across five languages and multiple state-of-the-art models, it establishes a reproducible, contamination-resistant benchmark for multilingual knowledge transfer assessment.

Technology Category

Application Category

📝 Abstract

Evaluating cross-lingual knowledge transfer in large language models is challenging, as correct answers in a target language may arise either from genuine transfer or from prior exposure during pre-training. We present LiveCLKTBench, an automated generation pipeline specifically designed to isolate and measure cross-lingual knowledge transfer. Our pipeline identifies self-contained, time-sensitive knowledge entities from real-world domains, filters them based on temporal occurrence, and verifies them against the model's knowledge. The documents of these valid entities are then used to generate factual questions, which are translated into multiple languages to evaluate transferability across linguistic boundaries. Using LiveCLKTBench, we evaluate several LLMs across five languages and observe that cross-lingual transfer is strongly influenced by linguistic distance and often asymmetric across language directions. While larger models improve transfer, the gains diminish with scale and vary across domains. These findings provide new insights into multilingual transfer and demonstrate the value of LiveCLKTBench as a reliable benchmark for future research.

Problem

Research questions and friction points this paper is trying to address.

Isolating genuine cross-lingual knowledge transfer from prior exposure in LLMs

Evaluating transferability across languages using time-sensitive factual questions

Analyzing how linguistic distance and model scale affect multilingual transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline isolates cross-lingual knowledge transfer

Generates factual questions from time-sensitive real-world entities

Evaluates transferability across multiple languages using translations

🔎 Similar Papers

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models