FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address performance bottlenecks of Chinese-centric multilingual machine translation (MMT) in low-resource and zero-shot settings, this paper proposes a sparse large language model architecture. Methodologically, it introduces a novel two-stage paradigm: “Chinese-centric large-scale pretraining” followed by “multilingual progressive fine-tuning,” integrating Mixture-of-Experts (MoE) sparsity with curriculum learning to enhance parameter efficiency and cross-lingual transferability. The model supports 65 languages and significantly improves robustness for low-resource language pairs and zero-shot generalization to unseen language directions. Experiments show an average +4.2 BLEU gain on low-resource translation and zero-shot translation performance reaching 78% of supervised baselines—outperforming state-of-the-art LLMs and dedicated MT models. The core contribution is the first Chinese-centric, sparse, and highly generalizable multilingual translation framework, bridging the gap between monolingual LLM scalability and specialized MMT efficacy.

Technology Category

Application Category

📝 Abstract

In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.

Problem

Research questions and friction points this paper is trying to address.

Sparsifying large language models for Chinese-centric multilingual translation

Improving low-resource translation via Mixture-of-Experts and curriculum learning

Enabling zero-shot translation for unseen language pairs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparsified large language model for translation

Two-stage training with Chinese pre-training

Mixture-of-Experts and curriculum learning

🔎 Similar Papers

No similar papers found.