Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the performance bottleneck of fine-tuning large language models (LLMs) for low-resource languages via cross-lingual transfer. We propose a lightweight transfer method based on intermediate-layer representation alignment. Through multilingual representation analysis, we first systematically demonstrate that intermediate layers exhibit optimal cross-lingual alignment capability. Building on this insight, we design a decoupled training objective—intermediate-layer alignment loss—that enables plug-and-play, modular integration without full-model retraining. Crucially, our approach is language-agnostic and generalizes to unseen languages without requiring a designated pivot language. We validate the method on cross-lingual slot filling, machine translation, and structured generation tasks, achieving significant performance gains for low-resource languages. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

While large language models demonstrate remarkable capabilities at task-specific applications through fine-tuning, extending these benefits across diverse languages is essential for broad accessibility. However, effective cross-lingual transfer is hindered by LLM performance gaps across languages and the scarcity of fine-tuning data in many languages. Through analysis of LLM internal representations from over 1,000+ language pairs, we discover that middle layers exhibit the strongest potential for cross-lingual alignment. Building on this finding, we propose a middle-layer alignment objective integrated into task-specific training. Our experiments on slot filling, machine translation, and structured text generation show consistent improvements in cross-lingual transfer, especially to lower-resource languages. The method is robust to the choice of alignment languages and generalizes to languages unseen during alignment. Furthermore, we show that separately trained alignment modules can be merged with existing task-specific modules, improving cross-lingual capabilities without full re-training. Our code is publicly available (https://github.com/dannigt/mid-align).

Problem

Research questions and friction points this paper is trying to address.

Cross-lingual transfer in fine-tuned LLMs

Middle-layer representation alignment

Improving lower-resource language performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Middle-layer representation alignment

Cross-lingual transfer enhancement

Modular integration without re-training

🔎 Similar Papers

Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models