🤖 AI Summary
This work addresses the challenge of language control failure in large language models during multilingual tasks, which often manifests as either correct answers in the wrong language or correct-language outputs with incorrect content. The authors propose a four-scenario evaluation protocol that integrates an extended Logit Lens with cross-lingual semantic similarity analysis. This approach enables the first precise localization of language control functionality to the final few layers of the model. Leveraging this insight, they achieve highly efficient multilingual adaptation by fine-tuning only 3–5% of the model parameters. Evaluated on Qwen-3-32B and Bloom-7.1B, the method attains over 98% language consistency without compromising task accuracy, matching the performance of full-parameter fine-tuning while substantially reducing computational overhead.
📝 Abstract
Despite multilingual pretraining, large language models often struggle with non-English tasks, particularly in language control, the ability to respond in the intended language. We identify and characterize two key failure modes: the multilingual transfer bottleneck (correct language, incorrect task response) and the language consistency bottleneck (correct task response, wrong language). To systematically surface these issues, we design a four-scenario evaluation protocol spanning MMLU, MGSM, and XQuAD benchmarks. To probe these issues with interpretability, we extend logit lens analysis to track language probabilities layer by layer and compute cross-lingual semantic similarity of hidden states. The results reveal a three-phase internal structure: early layers align inputs into a shared semantic space, middle layers perform task reasoning, and late layers drive language-specific generation. Guided by these insights, we introduce selective fine-tuning of only the final layers responsible for language control. On Qwen-3-32B and Bloom-7.1B, this method achieves over 98 percent language consistency across six languages while fine-tuning only 3-5 percent of parameters, without sacrificing task accuracy. Importantly, this result is nearly identical to that of full-scope fine-tuning (for example, above 98 percent language consistency for both methods across all prompt scenarios) but uses a fraction of the computational resources. To the best of our knowledge, this is the first approach to leverage layer-localization of language control for efficient multilingual adaptation.