CLEAR: Cross-Lingual Enhancement in Alignment via Reverse-training

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the performance degradation of multilingual embedding models in cross-lingual retrieval, particularly for low-resource languages, which often stems from imbalanced language resources and insufficient alignment—sometimes even compromising the performance of high-resource languages like English. To mitigate this, the authors propose CLEAR, a contrastive learning–based loss function that introduces a novel reverse training mechanism leveraging English passages as semantic anchors to strengthen bidirectional alignment between target languages and English. This approach significantly enhances cross-lingual retrieval effectiveness for low-resource languages by up to 15% across multiple tasks, without sacrificing English performance, and remains effective in joint multilingual training settings.

Technology Category

Application Category

📝 Abstract

Existing multilingual embedding models often encounter challenges in cross-lingual scenarios due to imbalanced linguistic resources and less consideration of cross-lingual alignment during training. Although standardized contrastive learning approaches for cross-lingual adaptation are widely adopted, they may struggle to capture fundamental alignment between languages and degrade performance in well-aligned languages such as English. To address these challenges, we propose Cross-Lingual Enhancement in Retrieval via Reverse-training (CLEAR), a novel loss function utilizing a reverse training scheme to improve retrieval performance across diverse cross-lingual retrieval scenarios. CLEAR leverages an English passage as a bridge to strengthen alignments between the target language and English, ensuring robust performance in the cross-lingual retrieval task. Our extensive experiments demonstrate that CLEAR achieves notable improvements in cross-lingual scenarios, with gains up to 15%, particularly in low-resource languages, while minimizing performance degradation in English. Furthermore, our findings highlight that CLEAR offers promising effectiveness even in multilingual training, suggesting its potential for broad application and scalability. We release the code at https://github.com/dltmddbs100/CLEAR.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual alignment

multilingual embedding

low-resource languages

cross-lingual retrieval

language imbalance

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual retrieval

reverse training

alignment enhancement