Semantic Pivots Enable Cross-Lingual Transfer in Large Language Models

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the intrinsic mechanisms underlying large language models’ (LLMs) cross-lingual capabilities. To this end, we propose a word-level cross-lingual translation task and employ intermediate-layer activation tracking to identify, for the first time, two distinct cross-lingual reasoning patterns: “co-occurrence-driven” versus “semantic-hub-driven” inference. Leveraging pretraining data, we mine semantic hubs—word pairs exhibiting both high cross-lingual co-occurrence frequency and semantic stability—and design a semantic-hub-aware pretraining corpus reconstruction method. Our approach significantly improves cross-lingual generalization, yielding an average +2.3 percentage point gain on multilingual understanding benchmarks including XNLI and XCOPA. Empirical results confirm semantic hubs serve as a critical mediating mechanism for cross-lingual transfer. The core contributions are: (1) uncovering the structural role of semantic hubs in shaping LLMs’ cross-lingual competence; and (2) introducing a scalable, data-driven corpus reconstruction paradigm grounded in linguistic regularities.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) demonstrate remarkable ability in cross-lingual tasks. Understanding how LLMs acquire this ability is crucial for their interpretability. To quantify the cross-lingual ability of LLMs accurately, we propose a Word-Level Cross-Lingual Translation Task. To find how LLMs learn cross-lingual ability, we trace the outputs of LLMs' intermediate layers in the word translation task. We identify and distinguish two distinct behaviors in the forward pass of LLMs: co-occurrence behavior and semantic pivot behavior. We attribute LLMs' two distinct behaviors to the co-occurrence frequency of words and find the semantic pivot from the pre-training dataset. Finally, to apply our findings to improve the cross-lingual ability of LLMs, we reconstruct a semantic pivot-aware pre-training dataset using documents with a high proportion of semantic pivots. Our experiments validate the effectiveness of our approach in enhancing cross-lingual ability. Our research contributes insights into the interpretability of LLMs and offers a method for improving LLMs' cross-lingual ability.
Problem

Research questions and friction points this paper is trying to address.

Quantify LLMs' cross-lingual ability via translation task
Identify co-occurrence and semantic pivot behaviors in LLMs
Enhance cross-lingual ability using semantic pivot-aware datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Propose Word-Level Cross-Lingual Translation Task
Identify co-occurrence and semantic pivot behaviors
Reconstruct semantic pivot-aware pre-training dataset
🔎 Similar Papers
No similar papers found.
Kaiyu He
Kaiyu He
University of Texas at Dallas
AIMachine LearningStatisticsCognitive Science
T
Tong Zhou
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences
Yubo Chen
Yubo Chen
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingInformation ExtractionEvent ExtractionLarge Language Model
D
Delai Qiu
Unisound Al Technology Co,Ltd
S
Shengping Liu
Unisound Al Technology Co,Ltd
K
Kang Liu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences; Shanghai Artificial Intelligence Laboratory
J
Jun Zhao
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences