Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation

📅 2025-02-06

🏛️ International Joint Conference on Natural Language Processing

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing non-autoregressive multilingual neural machine translation (MNMT) heavily relies on computationally expensive knowledge distillation (KD) to achieve competitive performance, hindering efficiency and scalability. This paper proposes M-DAT, the first KD-free framework for non-autoregressive multilingual translation. Built upon the Directed Acyclic Transformer (DAT) architecture, M-DAT integrates multilingual joint training with a novel pivot back-translation (PivotBT) strategy to explicitly model latent cross-lingual alignments, thereby substantially improving zero-shot generalization to unseen language directions. Evaluated on standard multilingual benchmarks, M-DAT achieves state-of-the-art performance among non-autoregressive models: it attains a 3.2× speedup over autoregressive baselines while incurring only a marginal BLEU degradation of 0.4–0.8 points. Thus, M-DAT bridges the longstanding trade-off between inference efficiency and translation accuracy in multilingual NMT, enabling scalable, high-fidelity non-autoregressive translation without KD.

Technology Category

Application Category

📝 Abstract

Multilingual neural machine translation (MNMT) aims at using one single model for multiple translation directions. Recent work applies non-autoregressive Transformers to improve the efficiency of MNMT, but requires expensive knowledge distillation (KD) processes. To this end, we propose an M-DAT approach to non-autoregressive multilingual machine translation. Our system leverages the recent advance of the directed acyclic Transformer (DAT), which does not require KD. We further propose a pivot back-translation (PivotBT) approach to improve the generalization to unseen translation directions. Experiments show that our M-DAT achieves state-of-the-art performance in non-autoregressive MNMT.

Problem

Research questions and friction points this paper is trying to address.

Eliminate knowledge distillation in multilingual machine translation.

Enhance efficiency using non-autoregressive Transformers.

Improve generalization to unseen translation directions.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-autoregressive Transformers improve MNMT efficiency

Directed acyclic Transformer avoids knowledge distillation

Pivot back-translation enhances generalization capabilities

🔎 Similar Papers

No similar papers found.