🤖 AI Summary
Zero-shot neural machine translation (NMT) suffers from substantial noise and suboptimal performance under both direct translation and English-centric pivot strategies. To address this, we propose a two-level beam search ensemble framework: at the lower level, multiple pre-trained multilingual models decode in parallel and independently; at the upper level, path-level dynamic collaboration is achieved via soft voting over candidate hypotheses. Furthermore, we introduce the first lossless knowledge distillation of the ensemble’s collective output back into the original multilingual model—preserving inference efficiency while enhancing translation quality. This work pioneers the integration of hierarchical beam search with ensemble knowledge distillation. Empirical evaluation on OPUS-100 and Tatoeba benchmarks demonstrates state-of-the-art performance, significantly outperforming direct translation, pivot-based approaches, and existing ensemble baselines. Notably, the distilled model achieves faster inference speed without BLEU degradation—indeed, BLEU scores improve.
📝 Abstract
The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions. Alternatively, zero-shot translation can be accomplished by pivoting through a third language (e.g., English). In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but they are synchronized by a"soft voting"mechanism at the upper level. Results on two popular multilingual translation datasets show that EBBS consistently outperforms direct and pivot translations as well as existing ensemble techniques. Further, we can distill the ensemble's knowledge back to the multilingual model to improve inference efficiency; profoundly, our EBBS-based distillation does not sacrifice, or even improves, the translation quality.