Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

📅 2024-06-12

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work investigates the root cause of encoder representation transfer failure in zero-shot multilingual neural machine translation (MNMT), revealing that source-language representations are mapped into target-language subspaces rather than the desired language-invariant space. To address this, we first propose using identical sentence pairs as a reference to quantify intra-lingual representation consistency—a novel metric. We then design a low-rank language-specific embedding module to disentangle language-specific and cross-lingual information in the encoder. Additionally, we introduce a language-level contrastive learning mechanism to enhance the decoder’s ability to discriminate language identities. Experiments on Europarl-15, TED-19, and OPUS-100 demonstrate substantial improvements in zero-shot translation performance without degrading supervised translation quality, validating both the effectiveness and generalizability of our approach in enhancing representation transferability.

Technology Category

Application Category

📝 Abstract

Understanding representation transfer in multilingual neural machine translation (MNMT) can reveal the reason for the zero-shot translation deficiency. In this work, we systematically analyze the representational issue of MNMT models. We first introduce the identity pair, translating a sentence to itself, to address the lack of the base measure in multilingual investigations, as the identity pair can reflect the representation of a language within the model. Then, we demonstrate that the encoder transfers the source language to the representational subspace of the target language instead of the language-agnostic state. Thus, the zero-shot translation deficiency arises because the representation of a translation is entangled with other languages and not transferred to the target language effectively. Based on our findings, we propose two methods: 1) low-rank language-specific embedding at the encoder, and 2) language-specific contrastive learning of the representation at the decoder. The experimental results on Europarl-15, TED-19, and OPUS-100 datasets show that our methods substantially enhance the performance of zero-shot translations without sacrifices in supervised directions by improving language transfer capacity, thereby providing practical evidence to support our conclusions. Codes are available at https://github.com/zhiqu22/ZeroTrans.

Problem

Research questions and friction points this paper is trying to address.

Analyzing representation transfer in multilingual neural machine translation

Addressing zero-shot translation deficiency via language representation analysis

Improving language transfer capacity in encoder-decoder models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-rank language-specific encoder embedding

Language-specific contrastive decoder learning

Identity pairs for base measure analysis

🔎 Similar Papers

No similar papers found.