Adapting Multilingual Models to Code-Mixed Tasks via Model Merging

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
For code-mixed text understanding in low-resource language pairs (e.g., English–Hindi, English–Spanish), this work proposes an efficient model fusion-based adaptation framework. The method first conducts continual pretraining of XLM-R and Llama-series models on unlabeled code-mixed corpora, then integrates updated and pretrained parameters via model merging strategies—specifically TV and TIES—before fine-tuning on downstream sentence classification tasks (sentiment analysis, hate speech detection). Its key contribution is the first application of model fusion to code-mixed NLP, enabling synergistic exploitation of multilingual knowledge and code-mixed distribution without additional annotation. Experiments demonstrate consistent improvements: +2–5 F1 points on English–Hindi and English–Spanish benchmarks, and 0.68 F1 on cross-lingual transfer tasks (English–Tamil, English–Malayalam), substantially outperforming standard fine-tuning and standalone continual pretraining. The approach is both effective and scalable.

Technology Category

Application Category

📝 Abstract
We study model merging as a practical alternative to conventional adaptation strategies for code-mixed NLP. Starting from a multilingual base model, we: (i) perform continued pre-training (CPT) on unlabeled code-mixed text to obtain an adapted checkpoint, (ii) merge checkpoint with the base model, and (iii) fine-tune (FT) on the downstream task data. We evaluate our approach for sentence classification (sentiment and hate speech) task in English-Hindi (En-Hi) and English-Spanish (En-Es) using XLM-R and Llama-3.2-1B models. Our results show that merged models consistently outperform full fine-tuning and CPT->FT. We observe gains of 2--5 points in F1 over full fine-tuning and ~1-2 points over CPT->FT, indicating that unlabeled data is leveraged more effectively via merging than via CPT alone. Zero-/few-shot prompting with larger LLMs (e.g., Llama-3.3-70B) lags behind fine-tuned and merged checkpoints, underscoring limits of in-context learning for code-mixed inputs. We further test cross-pair transfer by training on En-Hi and evaluating on En-Ta and En-Ml: merged checkpoints transfer more strongly than monolingual-English baselines (e.g., TV/TIES variants reaching 0.65-0.68 F1 vs 0.61-0.63 for full fine-tuning), suggesting that code-mixed knowledge is a more reliable substrate for low-resource pairs. We conclude with adaptation recipes matched to common data regimes (labeled only; labeled+unlabeled; transfer-only) and discuss limitations and scaling considerations for broader tasks and larger models.
Problem

Research questions and friction points this paper is trying to address.

Adapt multilingual models to code-mixed NLP tasks
Improve performance over conventional fine-tuning methods
Enhance cross-language transfer for low-resource pairs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model merging adapts multilingual models to code-mixed tasks
Combines continued pre-training with merging and fine-tuning
Improves performance over full fine-tuning and CPT->FT
🔎 Similar Papers
No similar papers found.