The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation

πŸ“… 2026-05-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

178K/year
πŸ“ Abstract
Knowledge transfer, especially across related languages, has been found beneficial for multilingual neural machine translation (MNMT), but some aspects are still under-explored and deserve further investigation. A joint vocabulary is most often applied to form a uniform word embedding space, but since the impact of a disjoint vocabulary on model performance is far less studied, there is no consensus on how much knowledge transfer is mainly due to vocabulary overlap. In this paper, we present systematic experiments with joint and disjoint vocabularies, and auxiliary languages related and unrelated to the source language. We design this experiment in an out-of-domain setup in order to emphasize transfer and the impact of the auxiliary language. As expected, we yield better results with more extensive vocabulary overlaps typical for related languages, but our experiments also show that domain-match and language relatedness are more important than a joint vocabulary.
Problem

Research questions and friction points this paper is trying to address.

vocabulary overlap
knowledge transfer
multilingual machine translation
joint vocabulary
language relatedness
Innovation

Methods, ideas, or system contributions that make the work stand out.

vocabulary overlap
knowledge transfer
multilingual machine translation
joint vocabulary
language relatedness
πŸ”Ž Similar Papers
No similar papers found.