🤖 AI Summary
To address performance bottlenecks of Chinese-centric multilingual machine translation (MMT) in low-resource and zero-shot settings, this paper proposes a sparse large language model architecture. Methodologically, it introduces a novel two-stage paradigm: “Chinese-centric large-scale pretraining” followed by “multilingual progressive fine-tuning,” integrating Mixture-of-Experts (MoE) sparsity with curriculum learning to enhance parameter efficiency and cross-lingual transferability. The model supports 65 languages and significantly improves robustness for low-resource language pairs and zero-shot generalization to unseen language directions. Experiments show an average +4.2 BLEU gain on low-resource translation and zero-shot translation performance reaching 78% of supervised baselines—outperforming state-of-the-art LLMs and dedicated MT models. The core contribution is the first Chinese-centric, sparse, and highly generalizable multilingual translation framework, bridging the gap between monolingual LLM scalability and specialized MMT efficacy.
📝 Abstract
In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.