CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of multilingual embedding models in cross-lingual retrieval, particularly for low-resource languages, which often stems from imbalanced language resources and insufficient alignment—sometimes even compromising the performance of high-resource languages like English. To mitigate this, the authors propose CLEAR, a contrastive learning–based loss function that introduces a novel reverse training mechanism leveraging English passages as semantic anchors to strengthen bidirectional alignment between target languages and English. This approach significantly enhances cross-lingual retrieval effectiveness for low-resource languages by up to 15% across multiple tasks, without sacrificing English performance, and remains effective in joint multilingual training settings.
📝 Abstract
Existing multilingual embedding models often encounter challenges in cross-lingual scenarios due to imbalanced linguistic resources and less consideration of cross-lingual alignment during training. Although standardized contrastive learning approaches for cross-lingual adaptation are widely adopted, they may struggle to capture fundamental alignment between languages and degrade performance in well-aligned languages such as English. To address these challenges, we propose Cross-Lingual Enhancement in Retrieval via Reverse-training (CLEAR), a novel loss function utilizing a reverse training scheme to improve retrieval performance across diverse cross-lingual retrieval scenarios. CLEAR leverages an English passage as a bridge to strengthen alignments between the target language and English, ensuring robust performance in the cross-lingual retrieval task. Our extensive experiments demonstrate that CLEAR achieves notable improvements in cross-lingual scenarios, with gains up to 15%, particularly in low-resource languages, while minimizing performance degradation in English. Furthermore, our findings highlight that CLEAR offers promising effectiveness even in multilingual training, suggesting its potential for broad application and scalability. We release the code at https://github.com/dltmddbs100/CLEAR.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual alignment
multilingual embedding
low-resource languages
cross-lingual retrieval
language imbalance
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual retrieval
reverse training
alignment enhancement
multilingual embedding
low-resource languages
🔎 Similar Papers
No similar papers found.