LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of language control failure in large language models during multilingual tasks, which often manifests as either correct answers in the wrong language or correct-language outputs with incorrect content. The authors propose a four-scenario evaluation protocol that integrates an extended Logit Lens with cross-lingual semantic similarity analysis. This approach enables the first precise localization of language control functionality to the final few layers of the model. Leveraging this insight, they achieve highly efficient multilingual adaptation by fine-tuning only 3–5% of the model parameters. Evaluated on Qwen-3-32B and Bloom-7.1B, the method attains over 98% language consistency without compromising task accuracy, matching the performance of full-parameter fine-tuning while substantially reducing computational overhead.

Technology Category

Application Category

📝 Abstract
Despite multilingual pretraining, large language models often struggle with non-English tasks, particularly in language control, the ability to respond in the intended language. We identify and characterize two key failure modes: the multilingual transfer bottleneck (correct language, incorrect task response) and the language consistency bottleneck (correct task response, wrong language). To systematically surface these issues, we design a four-scenario evaluation protocol spanning MMLU, MGSM, and XQuAD benchmarks. To probe these issues with interpretability, we extend logit lens analysis to track language probabilities layer by layer and compute cross-lingual semantic similarity of hidden states. The results reveal a three-phase internal structure: early layers align inputs into a shared semantic space, middle layers perform task reasoning, and late layers drive language-specific generation. Guided by these insights, we introduce selective fine-tuning of only the final layers responsible for language control. On Qwen-3-32B and Bloom-7.1B, this method achieves over 98 percent language consistency across six languages while fine-tuning only 3-5 percent of parameters, without sacrificing task accuracy. Importantly, this result is nearly identical to that of full-scope fine-tuning (for example, above 98 percent language consistency for both methods across all prompt scenarios) but uses a fraction of the computational resources. To the best of our knowledge, this is the first approach to leverage layer-localization of language control for efficient multilingual adaptation.
Problem

Research questions and friction points this paper is trying to address.

language control
multilingual LLMs
language consistency
multilingual transfer
non-English tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

language control
selective fine-tuning
logit lens
multilingual LLMs
layer localization
🔎 Similar Papers
No similar papers found.