Why Better Cross-Lingual Alignment Fails for Better Cross-Lingual Transfer: Case of Encoders

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

This study investigates the paradoxical phenomenon where improved cross-lingual alignment fails to enhance downstream task performance. Through representational analyses—including embedding distance, gradient similarity, and magnitude—conducted on an XLM-R encoder across part-of-speech tagging and sentence classification tasks, the authors reveal that the gradients of alignment objectives are approximately orthogonal to those of task-specific objectives, thereby hindering effective knowledge transfer. The findings challenge the common assumption that “better alignment implies better transfer,” demonstrating that embedding distance alone is an unreliable predictor of downstream performance. To address this, the work proposes practical guidelines for jointly optimizing alignment and fine-tuning, offering a new perspective for improving multilingual model effectiveness.

Technology Category

Application Category

📝 Abstract

Better cross-lingual alignment is often assumed to yield better cross-lingual transfer. However, explicit alignment techniques -- despite increasing embedding similarity -- frequently fail to improve token-level downstream performance. In this work, we show that this mismatch arises because alignment and downstream task objectives are largely orthogonal, and because the downstream benefits from alignment vary substantially across languages and task types. We analyze four XLM-R encoder models aligned on different language pairs and fine-tuned for either POS Tagging or Sentence Classification. Using representational analyses, including embedding distances, gradient similarities, and gradient magnitudes for both task and alignment losses, we find that: (1) embedding distances alone are unreliable predictors of improvements (or degradations) in task performance and (2) alignment and task gradients are often close to orthogonal, indicating that optimizing one objective may contribute little to optimizing the other. Taken together, our findings explain why ``better'' alignment often fails to translate into ``better'' cross-lingual transfer. Based on these insights, we provide practical guidelines for combining cross-lingual alignment with task-specific fine-tuning, highlighting the importance of careful loss selection.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual alignment

cross-lingual transfer

downstream performance

task-objective mismatch

multilingual encoders

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual alignment

gradient orthogonality

representational analysis