Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically investigates the performance, origins, and formation mechanisms of cross-lingual contextual retrieval capability in large language models (LLMs). Using the cross-lingual machine reading comprehension (xMRC) benchmark, we evaluate over 40 models across 12 languages. We uncover, for the first time, a two-stage emergence: pretraining establishes cross-lingual question encoding, while post-training—especially multilingual supervised fine-tuning—drives answer retrieval enhancement, constituting the critical bottleneck for performance leaps; scaling pretraining alone proves ineffective. Through layer-wise interpretability analysis, oracle estimation, and ablation studies, we localize the xMRC bottleneck to the final-layer representations of the second stage. Notably, several compact open-source models match GPT-4o’s performance, offering both theoretical insights and empirical validation for efficient multilingual model design.

Technology Category

Application Category

📝 Abstract
The ability of cross-lingual context retrieval is a fundamental aspect of cross-lingual alignment of large language models (LLMs), where the model extracts context information in one language based on requests in another language. Despite its importance in real-life applications, this ability has not been adequately investigated for state-of-the-art models. In this paper, we evaluate the cross-lingual context retrieval ability of over 40 LLMs across 12 languages to understand the source of this ability, using cross-lingual machine reading comprehension (xMRC) as a representative scenario. Our results show that several small, post-trained open LLMs show strong cross-lingual context retrieval ability, comparable to closed-source LLMs such as GPT-4o, and their estimated oracle performances greatly improve after post-training. Our interpretability analysis shows that the cross-lingual context retrieval process can be divided into two main phases: question encoding and answer retrieval, which are formed in pre-training and post-training, respectively. The phasing stability correlates with xMRC performance, and the xMRC bottleneck lies at the last model layers in the second phase, where the effect of post-training can be evidently observed. Our results also indicate that larger-scale pretraining cannot improve the xMRC performance. Instead, larger LLMs need further multilingual post-training to fully unlock their cross-lingual context retrieval potential. Our code and is available at https://github.com/NJUNLP/Cross-Lingual-Context-Retrieval
Problem

Research questions and friction points this paper is trying to address.

Evaluates cross-lingual context retrieval in 40+ LLMs across 12 languages
Identifies question encoding and answer retrieval as key phases
Shows post-training unlocks cross-lingual potential better than pretraining scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates 40+ LLMs using xMRC scenario
Identifies two-phase retrieval process
Highlights post-training over pretraining importance