🤖 AI Summary
This work addresses the challenge of adapting large language models to low-resource languages, where scarcity of task-specific data and limited computational resources hinder performance. Existing approaches such as Proxy Tuning are constrained by the inherently weak low-resource language capabilities of large models. To overcome this limitation, the authors propose TriMix, a novel framework that dynamically fuses three logits during inference: those from a continually pre-trained small model excelling in low-resource languages, a large model fine-tuned on high-resource language instructions for strong task performance, and the large model’s generalization capacity. TriMix breaks from the prevailing “bigger-is-better” paradigm by prioritizing the small model’s strengths, enabling effective adaptation without requiring labeled data in the target low-resource language. Experiments across four model families and eight low-resource languages demonstrate that TriMix significantly outperforms both single-model baselines and Proxy Tuning.
📝 Abstract
Adapting large language models (LLMs) to low-resource languages (LRLs) is constrained by the scarcity of task data and computational resources. Although Proxy Tuning offers a logit-level strategy for introducing scaling effects, it often fails in LRL settings because the large model's weak LRL competence might overwhelm the knowledge of specialized smaller models. We thus propose TriMix, a test-time logit fusion framework that dynamically balances capabilities from three different sources: LRL competence from a continually pretrained small model, task competence from high-resource language instruction tuning, and the scaling benefits of large models. It is data- and compute-efficient, requiring no LRL task annotations, and only continual pretraining on a small model. Experiments across four model families and eight LRLs show that TriMix consistently outperforms single-model baselines and Proxy Tuning. Our analysis reveals that prioritizing the small LRL-specialized model's logits is crucial for success, challenging the prevalent large-model-dominant assumption.