Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study systematically investigates how domain characteristics and scale of parallel data affect cross-lingual transfer performance of vision–language (VL) encoders—a previously underexplored issue. We propose a fine-tuning framework based on pretrained multilingual VL encoders, wherein only the text encoder is updated, combined with multilingual joint training and domain-controlled ablation experiments. Our key findings are: (1) machine-translated parallel data yields the best overall performance, though authentic subtitle-style data shows superior efficacy for certain low-resource languages; and (2) most languages exhibit substantial gains from scaling up multilingual parallel data. On benchmarks including XNLI and Flickr30k-CN, our approach achieves significant improvements in zero-shot cross-lingual transfer accuracy. These results empirically validate that domain alignment of parallel data and linguistic diversity are critical factors for enhancing VL model generalization across languages.

Technology Category

Application Category

📝 Abstract

Most pre-trained Vision-Language (VL) models and training data for the downstream tasks are only available in English. Therefore, multilingual VL tasks are solved using cross-lingual transfer: fine-tune a multilingual pre-trained model or transfer the text encoder using parallel data. We study the alternative approach: transferring an already trained encoder using parallel data. We investigate the effect of parallel data: domain and the number of languages, which were out of focus in previous work. Our results show that even machine-translated task data are the best on average, caption-like authentic parallel data outperformed it in some languages. Further, we show that most languages benefit from multilingual training.

Problem

Research questions and friction points this paper is trying to address.

Effect of parallel data on cross-lingual transfer for VL encoders

Impact of domain and language count in parallel data

Performance comparison: machine-translated vs authentic parallel data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer trained encoder using parallel data

Study parallel data domain and languages

Machine-translated task data performs best

🔎 Similar Papers

No similar papers found.