XITE: Cross-lingual Interpolation for Transfer using Embeddings

๐Ÿ“… 2026-04-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

156K/year
๐Ÿค– AI Summary
This work addresses the poor cross-lingual transfer performance of multilingual models on low-resource languages by proposing XITE, a novel method that first matches unlabeled text in low-resource languages with labeled examples from high-resource languages (e.g., English) based on embedding similarity. It then generates synthetic training data via sourceโ€“target embedding interpolation for fine-tuning and incorporates Linear Discriminant Analysis (LDA) to map target-language representations into a semantically richer subspace. XITE is the first approach to jointly leverage embedding interpolation and LDA for cross-lingual data augmentation, effectively mitigating catastrophic forgetting while substantially improving transfer performance. On sentiment analysis and natural language inference tasks, it achieves gains of up to 35.91% and 81.16% for languages including Korean, Arabic, Urdu, and Hindi, without compromising performance on high-resource languages.

Technology Category

Application Category

๐Ÿ“ Abstract
Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages including Korean, Arabic, Urdu and Hindi. Apart from boosting cross-lingual transfer, adaptation using XITE also safeguards against forgetting and maintains task performance on the high-resource language.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual transfer
low-resource languages
multilingual language models
data augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual transfer
embedding interpolation
data augmentation
linear discriminant analysis
multilingual language models
๐Ÿ”Ž Similar Papers
No similar papers found.