ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework

📅 2024-10-25

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

178K/year

🤖 AI Summary

To address the significant performance gap between English and non-dominant (especially low-resource) languages in multilingual large language models (LLMs), this paper proposes ShifCon—a novel framework that introduces dynamic cross-subspace language representation transfer. ShifCon integrates layer-adaptive subspace distance metrics with multilingual contrastive learning to achieve precise semantic alignment of non-dominant languages toward the dominant (English) language structure. Crucially, it implicitly enforces cross-lingual representation consistency during LLM forward propagation, requiring no additional parameters or fine-tuning data. Evaluated on mainstream multilingual models—including XGLM and mT5—ShifCon yields average improvements of 12.7% in BLEU (for generation) and accuracy (for understanding) across 12 non-dominant languages. It substantially mitigates inter-lingual performance disparity and establishes a scalable, highly compatible paradigm for enhancing low-resource language capabilities in pretrained multilingual LLMs.

Technology Category

Application Category

📝 Abstract

Although fine-tuning Large Language Models (LLMs) with multilingual data can rapidly enhance the multilingual capabilities of LLMs, they still exhibit a performance gap between the dominant language (e.g., English) and non-dominant ones due to the imbalance of training data across languages. To further enhance the performance of non-dominant languages, we propose ShifCon, a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one. Specifically, it shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters. The enriched representations are then shifted back into their original language subspace before generation. Moreover, we introduce a subspace distance metric to pinpoint the optimal layer area for shifting representations and employ multilingual contrastive learning to further enhance the alignment of representations within this area. Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages, particularly for low-resource ones. Further analysis offers extra insights to verify the effectiveness of ShifCon and propel future research

Problem

Research questions and friction points this paper is trying to address.

Reducing performance gap between dominant and non-dominant languages in LLMs

Aligning non-dominant language representations to dominant language subspace

Enhancing low-resource language performance via contrastive learning and shifting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shift-based Contrastive framework aligns language representations

Shifts non-dominant language representations to dominant subspace

Uses subspace distance metric and contrastive learning

🔎 Similar Papers

Lens: Rethinking Multilingual Enhancement for Large Language Models