Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This study investigates the early emergence mechanisms of cross-lingual generalization—particularly translation capability—in multilingual pretrained models. By frequently sampling checkpoints during the training of a 1.7B-parameter model and integrating behavioral analysis, component dissection, and parameter ablation with a newly constructed word-level translation dataset, the work provides the first fine-grained tracking of how translation ability evolves over time. The findings reveal a two-stage developmental trajectory: initial reliance on surface-level copying gives way to the gradual establishment of a generalized translation mechanism, with linguistic competence and copying behavior emerging rapidly and in tandem. This research offers empirical grounding for understanding cross-lingual generalization and advances a two-stage theoretical framework for the development of translation capacity in multilingual models.

Technology Category

Application Category

📝 Abstract
Large language models exhibit impressive cross-lingual capabilities. However, prior work analyzes this phenomenon through isolated factors and at sparse points during training, limiting our understanding of how cross-lingual generalization emerges--particularly in the early phases of learning. To study the early trajectory of linguistic and translation capabilities, we pretrain a multilingual 1.7B model on nine diverse languages, capturing checkpoints at a much finer granularity. We further introduce a novel word-level translation dataset and trace how translation develops over training through behavioral analyses, model-component analysis, and parameter-based ablations. We find that the model quickly acquires basic linguistic capabilities in parallel with token-level copying, while translation develops in two distinct phases: an initial phase dominated by copying and surface-level similarities, and a second phase in which more generalizing translation mechanisms are developed while copying is refined. Together, these findings provide a fine-grained view of how cross-lingual generalization develops during multilingual pretraining.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual generalization
multilingual pretraining
translation dynamics
early training phase
language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual pretraining
translation dynamics
cross-lingual generalization
fine-grained analysis
copying mechanism
🔎 Similar Papers
No similar papers found.