MERLIN: Multi-Stage Curriculum Alignment for Multilingual Encoder and LLM Fusion

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit substantially weaker complex reasoning capabilities on low-resource languages (LRLs) compared to English. To address this gap, we propose a two-stage model stacking framework that integrates a multilingual encoder with an LLM, coupled with a multi-phase curriculum-based alignment mechanism: first aligning semantic spaces via bilingual parallel texts, then progressively transferring to task-specific data (e.g., mathematical reasoning). Our method achieves efficient cross-lingual knowledge transfer by fine-tuning only a small number of DoRA low-rank parameters. On AfriMGSM, it improves accuracy by 12.9 percentage points, outperforming MindMerger and GPT-4o-mini; consistent gains are also observed on MGSM and MSVAMP. The core contributions lie in (i) curriculum-driven progressive representation alignment and (ii) a lightweight cross-lingual adaptation strategy—jointly narrowing the reasoning performance gap between LRLs and high-resource languages.

Technology Category

Application Category

📝 Abstract
Large language models excel in English but still struggle with complex reasoning in many low-resource languages (LRLs). Existing encoder-plus-decoder methods such as LangBridge and MindMerger raise accuracy on mid and high-resource languages, yet they leave a large gap on LRLs. We present MERLIN, a two-stage model-stacking framework that applies a curriculum learning strategy -- from general bilingual bitext to task-specific data -- and adapts only a small set of DoRA weights. On the AfriMGSM benchmark MERLIN improves exact-match accuracy by +12.9 pp over MindMerger and outperforms GPT-4o-mini. It also yields consistent gains on MGSM and MSVAMP (+0.9 and +2.8 pp), demonstrating effectiveness across both low and high-resource settings.
Problem

Research questions and friction points this paper is trying to address.

Improving complex reasoning in low-resource languages using LLMs
Bridging performance gap between high and low-resource languages
Enhancing multilingual accuracy without extensive parameter retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage model-stacking curriculum learning
Adapts small set DoRA weights
Fuses multilingual encoder with LLM
🔎 Similar Papers
No similar papers found.