🤖 AI Summary
This study investigates the paradoxical phenomenon where improved cross-lingual alignment fails to enhance downstream task performance. Through representational analyses—including embedding distance, gradient similarity, and magnitude—conducted on an XLM-R encoder across part-of-speech tagging and sentence classification tasks, the authors reveal that the gradients of alignment objectives are approximately orthogonal to those of task-specific objectives, thereby hindering effective knowledge transfer. The findings challenge the common assumption that “better alignment implies better transfer,” demonstrating that embedding distance alone is an unreliable predictor of downstream performance. To address this, the work proposes practical guidelines for jointly optimizing alignment and fine-tuning, offering a new perspective for improving multilingual model effectiveness.
📝 Abstract
Better cross-lingual alignment is often assumed to yield better cross-lingual transfer. However, explicit alignment techniques -- despite increasing embedding similarity -- frequently fail to improve token-level downstream performance. In this work, we show that this mismatch arises because alignment and downstream task objectives are largely orthogonal, and because the downstream benefits from alignment vary substantially across languages and task types. We analyze four XLM-R encoder models aligned on different language pairs and fine-tuned for either POS Tagging or Sentence Classification. Using representational analyses, including embedding distances, gradient similarities, and gradient magnitudes for both task and alignment losses, we find that: (1) embedding distances alone are unreliable predictors of improvements (or degradations) in task performance and (2) alignment and task gradients are often close to orthogonal, indicating that optimizing one objective may contribute little to optimizing the other. Taken together, our findings explain why ``better'' alignment often fails to translate into ``better'' cross-lingual transfer. Based on these insights, we provide practical guidelines for combining cross-lingual alignment with task-specific fine-tuning, highlighting the importance of careful loss selection.