On the Limits of Model Merging for Multilinguality in Pre-Training

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether robust multilingual performance can be achieved by merging monolingual pretrained models. Addressing the limitations of existing approaches in cross-lingual transfer, we systematically compare mixed multilingual pretraining, independent monolingual pretraining, and various model merging strategies. Our experiments reveal that while monolingual models excel in their respective languages, naive merging leads to severe performance collapse, underscoring the critical dependence of successful merging on cross-lingual representational similarity. We further demonstrate, for the first time, that merging mechanisms effective during fine-tuning do not readily generalize to the language-specific pretraining phase. These findings provide crucial theoretical insights and practical warnings for the design and construction of multilingual models.
📝 Abstract
Endowing models with consistent multilingual performance can be achieved by mixing pre-training data, or post-training approaches such as language-specific model merging. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads to performance collapse due to interference. Our analysis suggests representational similarity is a prerequisite for model merging. We therefore conclude that the flexibility of merging in fine-tuning does not extend trivially to language-specific pre-training.
Problem

Research questions and friction points this paper is trying to address.

model merging
multilinguality
pre-training
monolingual models
performance collapse
Innovation

Methods, ideas, or system contributions that make the work stand out.

model merging
multilinguality
pre-training
monolingual models
representational similarity
🔎 Similar Papers