Conditions for Catastrophic Forgetting in Multilingual Translation

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

In multilingual translation, fine-tuning foundation models often induces catastrophic forgetting—degrading performance on unseen languages—yet the precise triggering conditions remain unclear. This work investigates the impact of model architecture, training data scale, and fine-tuning methodology on forgetting via controlled experiments on machine translation benchmarks. Key findings are: (1) the relative scale between model capacity and target-language data volume is the primary determinant of forgetting; (2) instruction-following capability exerts greater influence than architectural choices; and (3) cross-lingual alignment substantially mitigates forgetting and enables positive transfer. Experiments span full-parameter fine-tuning and multiple parameter-efficient fine-tuning (PEFT) methods, revealing no universal advantage of PEFT over standard fine-tuning. To our knowledge, this is the first study to empirically delineate the boundary conditions of forgetting in multilingual fine-tuning, providing reproducible evidence and actionable strategies for preserving and enhancing multilingual generalization.

Technology Category

Application Category

📝 Abstract

Fine-tuning multilingual foundation models on specific languages often induces catastrophic forgetting, degrading performance on languages unseen in fine-tuning. While this phenomenon is widely-documented, the literature presents fragmented results about when forgetting occurs. To address this ambiguity, we conduct a systematic empirical study using machine translation as a testbed to identify the conditions that trigger catastrophic forgetting in multilingual fine-tuning. Through controlled experiments across different model architectures, data scales, and fine-tuning approaches, we reveal that the relative scale between model and data size is a primary determinant of forgetting. Moreover, we demonstrate that a model's instruction-following ability is more critical for retaining multilingual knowledge than its architecture. Contrary to assumptions, parameter-efficient fine-tuning offers no clear advantage over full fine-tuning in mitigating forgetting. Lastly, we show that cross-lingual alignment can mitigate forgetting while also facilitating positive transfer to unseen target languages.

Problem

Research questions and friction points this paper is trying to address.

Identifies conditions triggering catastrophic forgetting in multilingual fine-tuning

Determines model-data scale ratio as primary factor for performance degradation

Evaluates mitigation strategies for retaining multilingual knowledge during fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-data scale ratio determines forgetting severity

Instruction-following ability preserves multilingual knowledge better

Cross-lingual alignment mitigates forgetting and enables transfer

🔎 Similar Papers

Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies