Exploring Pretraining via Active Forgetting for Improving Cross Lingual Transfer for Decoder Language Models

📅 2024-10-21

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Weak cross-lingual transfer—particularly to low-resource languages—remains a key limitation of decoder-only large language models (LLMs). To address this, we propose a novel pretraining strategy grounded in an active forgetting mechanism. This work is the first to introduce active forgetting regularization into decoder-only multilingual pretraining, integrating multilingual mixed-data training with representation learning analysis. The approach significantly enhances zero-shot cross-lingual generalization to unseen languages. Experiments demonstrate consistent and substantial improvements over same-scale baselines across multilingual downstream tasks—including XNLI and XQuAD—with especially pronounced gains for low-resource languages. Remarkably, the resulting decoder-only models achieve cross-lingual transfer performance on par with strong encoder-based multilingual models (e.g., XLM-RoBERTa). Our method establishes a new paradigm for multilingual modeling with decoder-only architectures, advancing their viability for truly inclusive, resource-agnostic language understanding.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) demonstrate exceptional capabilities in a multitude of NLP tasks. However, the efficacy of such models to languages other than English is often limited. Prior works have shown that encoder-only models such as BERT or XLM-RoBERTa show impressive cross lingual transfer of their capabilities from English to other languages. In this work, we propose a pretraining strategy that uses active forgetting to achieve similar cross lingual transfer in decoder-only LLMs. We show that LLMs pretrained with active forgetting are highly effective when adapting to new and unseen languages. Through extensive experimentation, we find that LLMs pretrained with active forgetting are able to learn better multilingual representations which translates to better performance in many downstream tasks.

Problem

Research questions and friction points this paper is trying to address.

Improving cross lingual transfer for decoder-only LLMs

Enhancing multilingual representation learning in LLMs

Boosting performance in downstream NLP tasks for non-English languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Active forgetting enhances cross lingual transfer

Pretraining strategy for decoder-only LLMs

Better multilingual representations improve downstream tasks

🔎 Similar Papers

No similar papers found.