🤖 AI Summary
This study investigates the early emergence mechanisms of cross-lingual generalization—particularly translation capability—in multilingual pretrained models. By frequently sampling checkpoints during the training of a 1.7B-parameter model and integrating behavioral analysis, component dissection, and parameter ablation with a newly constructed word-level translation dataset, the work provides the first fine-grained tracking of how translation ability evolves over time. The findings reveal a two-stage developmental trajectory: initial reliance on surface-level copying gives way to the gradual establishment of a generalized translation mechanism, with linguistic competence and copying behavior emerging rapidly and in tandem. This research offers empirical grounding for understanding cross-lingual generalization and advances a two-stage theoretical framework for the development of translation capacity in multilingual models.
📝 Abstract
Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges--particularly in the early phases of learning. To study the early trajectory of linguistic and translation capabilities, we pretrain a multilingual 1.7B model on nine diverse languages, capturing checkpoints at a much finer granularity. We further introduce a novel word-level translation dataset and trace how translation develops over training through behavioral analyses, model-component analysis, and parameter-based ablations. We find that the model quickly acquires basic linguistic capabilities in parallel with token-level copying, while translation develops in two distinct phases: an initial phase dominated by copying and surface-level similarities, and a second phase in which more generalizing translation mechanisms are developed while copying is refined. Together, these findings provide a fine-grained view of how cross-lingual generalization develops during multilingual pretraining.