🤖 AI Summary
This work addresses the significant performance gap of multilingual large language models in non-English in-context learning, particularly when prompted with English examples but evaluated on non-English inputs. Assuming a shared semantic space within the model, the authors propose—without any training—a novel language vector method that steers intermediate-layer activations toward target-language directions during inference, achieving parameter-free cross-lingual alignment. The approach consistently improves performance across three datasets, 19 languages, and three mainstream models. Moreover, the derived language vectors exhibit clustering structures that closely align with established linguistic phylogenies and demonstrate strong cross-task generalizability.
📝 Abstract
While multilingual large language models have gained widespread adoption, their performance on non-English languages remains substantially inferior to English. This disparity is particularly evident in in-context learning scenarios, where providing demonstrations in English but testing on non-English inputs leads to significant performance degradation. In this paper, we hypothesize that LLMs develop a universal semantic space for understanding languages, where different languages are encoded as distinct directions within this space. Based on this hypothesis, we propose language vectors -- a training-free language steering approach that leverages activation differences between source and target languages to guide model behavior. We steer the model generations by adding the vector to the intermediate model activations during inference. This is done to make the model's internal representations shift towards the target language space without any parameter updates. We evaluate our method across three datasets and test on a total of 19 languages on three different models. Our results show consistent improvements on multilingual in-context learning over baselines across all tasks and languages tested. Beyond performance gains, hierarchical clustering of steering vectors reveals meaningful linguistic structure aligned with language families. These vectors also successfully transfer across tasks, demonstrating that these representations are task-agnostic.