Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multilingual neural machine translation (MNMT) still lags behind large language models (LLMs) in cross-lingual generalization and practical performance. To address this, we propose the *Register Mechanism*: introducing target-language-specific artificial tokens into a decoder-only architecture to explicitly decouple source-language understanding from target-language generation, augmented by a customized attention mask and large-scale multilingual pretraining (MITRE-913M). Our approach achieves, for the first time, decoder-only MNMT outperforming NLLB-3.3B on the EC-40 benchmark while matching the performance of commercial LLMs. We validate strong zero-shot and few-shot cross-lingual generalization across 24 languages using a 930-million-sentence-pair multilingual corpus. The model, training code, and pretrained weights are publicly released on GitHub.

Technology Category

Application Category

📝 Abstract
The multilingual neural machine translation (MNMT) enables arbitrary translations across multiple languages by training a model with limited parameters using parallel data only. However, the performance of such MNMT models still lags behind that of large language models (LLMs), limiting their practicality. In this work, we address this limitation by introducing registering to achieve the new state-of-the-art of decoder-only MNMT models. Specifically, we insert a set of artificial tokens specifying the target language, called registers, into the input sequence between the source and target tokens. By modifying the attention mask, the target token generation only pays attention to the activation of registers, representing the source tokens in the target language space. Experiments on EC-40, a large-scale benchmark, show that our method outperforms related methods driven by optimizing multilingual representations. We further scale up and collect 9.3 billion sentence pairs across 24 languages from public datasets to pre-train two models, namely MITRE (multilingual translation with registers). One of them, MITRE-913M, outperforms NLLB-3.3B, achieves comparable performance with commercial LLMs, and shows strong adaptability in fine-tuning. Finally, we open-source our models to facilitate further research and development in MNMT: https://github.com/zhiqu22/mitre.
Problem

Research questions and friction points this paper is trying to address.

Multilingual Neural Machine Translation
Large Language Models
Translation Performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Registration Mechanism
Multilingual Neural Machine Translation
Large-scale Model Training
🔎 Similar Papers
No similar papers found.
Zhi Qu
Zhi Qu
Nara Institute of Science and Technology
Machine TranslationNatural Language Processing
Y
Yiran Wang
National Institute of Information and Communications Technology, Japan.
J
Jiannan Mao
Gifu University, Japan.
C
Chenchen Ding
Nara Institute of Science and Technology, Japan., National Institute of Information and Communications Technology, Japan.
H
Hideki Tanaka
National Institute of Information and Communications Technology, Japan.
Masao Utiyama
Masao Utiyama
NICT
Machine Translation
Taro Watanabe
Taro Watanabe
Nara Institute of Science and Technology
Machine TranslationMachine Learning