🤖 AI Summary
This study investigates the cross-lingual generalization capability and practical bottlenecks of large language models (LLMs) in machine translation for low-resource languages (LRLs). Addressing challenges of data scarcity, privacy sensitivity, and computational constraints, we propose a lightweight collaborative optimization paradigm: integrating heterogeneous surrogate data—including news corpora and bilingual dictionaries—with knowledge distillation and progressive parameter-efficient fine-tuning. To our knowledge, this is the first systematic evaluation of LLM-based translation across 200 LRLs on the FLORES-200 benchmark, empirically uncovering critical limitations in zero-shot and few-shot cross-lingual transfer. Experimental results demonstrate substantial improvements in translation quality for small-scale LLMs on extremely low-resource languages, achieving an average BLEU gain of +12.4 across 37 languages. The findings validate that performance gains need not rely solely on model scaling, offering an effective, resource-efficient alternative for LRL translation.
📝 Abstract
Low-Resource Languages (LRLs) present significant challenges in natural language processing due to their limited linguistic resources and underrepresentation in standard datasets. While recent advancements in Large Language Models (LLMs) and Neural Machine Translation (NMT) have substantially improved translation capabilities for high-resource languages, performance disparities persist for LRLs, particularly impacting privacy-sensitive and resource-constrained scenarios. This paper systematically evaluates the limitations of current LLMs across 200 languages using benchmarks such as FLORES-200. We also explore alternative data sources, including news articles and bilingual dictionaries, and demonstrate how knowledge distillation from large pre-trained models can significantly improve smaller LRL translations. Additionally, we investigate various fine-tuning strategies, revealing that incremental enhancements markedly reduce performance gaps on smaller LLMs.