Bridging Latent Reasoning and Target-Language Generation via Retrieval-Transition Heads

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the attention mechanisms underpinning target-language generation and reasoning in multilingual large language models. Through analysis of attention heads in models such as Qwen-2.5 and Llama-3.1, the work identifies a novel class of “retrieval-to-translation heads” (RTHs) that specialize in transitioning from cross-lingual context to target-language output, playing a pivotal role in multilingual chain-of-thought reasoning. Leveraging attention visualization, cross-lingual ablation experiments, and comprehensive evaluation on four benchmarks—MMLU-ProX, MGSM, MLQA, and XQuaD—the study demonstrates that RTHs are functionally distinct from conventional retrieval heads. Ablating these heads results in significantly greater performance degradation, underscoring their critical contribution to multilingual reasoning capabilities.

Technology Category

Application Category

📝 Abstract
Recent work has identified a subset of attention heads in Transformer as retrieval heads, which are responsible for retrieving information from the context. In this work, we first investigate retrieval heads in multilingual contexts. In multilingual language models, we find that retrieval heads are often shared across multiple languages. Expanding the study to cross-lingual setting, we identify Retrieval-Transition heads(RTH), which govern the transition to specific target-language output. Our experiments reveal that RTHs are distinct from retrieval heads and more vital for Chain-of-Thought reasoning in multilingual LLMs. Across four multilingual benchmarks (MMLU-ProX, MGSM, MLQA, and XQuaD) and two model families (Qwen-2.5 and Llama-3.1), we demonstrate that masking RTH induces bigger performance drop than masking Retrieval Heads (RH). Our work advances understanding of multilingual LMs by isolating the attention heads responsible for mapping to target languages.
Problem

Research questions and friction points this paper is trying to address.

multilingual language models
attention heads
target-language generation
Chain-of-Thought reasoning
cross-lingual transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Transition Heads
multilingual language models
attention heads
Chain-of-Thought reasoning
cross-lingual generation
🔎 Similar Papers
No similar papers found.