🤖 AI Summary
This work identifies the root cause of inconsistent multilingual question-answering outputs in large language models (LLMs): their hidden representations fail to form a shared cross-lingual semantic space and instead degenerate into language-specific subspaces—a bias that intensifies with model scale. We empirically validate this phenomenon for the first time using logit lens interpretability analysis. To address it, we propose latent-space directional steering—a novel intervention that explicitly promotes utilization of cross-lingual shared semantics. Evaluated on multilingual multiple-choice reasoning benchmarks, our method significantly improves output consistency across languages and reasoning accuracy, particularly enhancing effective transfer of English-knowledge to low-resource languages. Our core contributions are: (1) the discovery of “scale-induced representation divergence”—a previously unreported degradation pattern in multilingual LLMs; and (2) the first interpretable, latent-space-based solution to enhance multilingual consistency without fine-tuning or architectural modification.
📝 Abstract
Large language models (LLMs) are demonstrably capable of cross-lingual transfer, but can produce inconsistent output when prompted with the same queries written in different languages. To understand how language models are able to generalize knowledge from one language to the others, we apply the logit lens to interpret the implicit steps taken by LLMs to solve multilingual multi-choice reasoning questions. We find LLMs predict inconsistently and are less accurate because they rely on subspaces of individual languages, rather than working in a shared semantic space. While larger models are more multilingual, we show their hidden states are more likely to dissociate from the shared representation compared to smaller models, but are nevertheless more capable of retrieving knowledge embedded across different languages. Finally, we demonstrate that knowledge sharing can be modulated by steering the models' latent processing towards the shared semantic space. We find reinforcing utilization of the shared space improves the models' multilingual reasoning performance, as a result of more knowledge transfer from, and better output consistency with English.