Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies

📅 2023-09-12

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses catastrophic forgetting in cross-lingual transfer—specifically, the degradation of source-language knowledge during target-language fine-tuning. We propose Cross-Lingual Validation (CLV), a novel paradigm that, for the first time within a unified framework, quantifies forgetting magnitude across multilingual models. Systematically comparing full-parameter fine-tuning versus adapter-based tuning, and intermediate-task training (IT) versus CLV, we analyze their trade-offs in preserving source-language (English) performance versus optimizing target-language accuracy. Experiments employ large language models on multilingual hate speech detection and product review classification datasets under zero-shot and full-shot settings. Results show CLV significantly outperforms IT in retaining source-language knowledge, reducing catastrophic forgetting by 23.6% average F1—challenging the prevailing assumption of IT’s superiority. Although IT yields marginally higher target-language performance, CLV achieves superior cross-lingual stability and transferability.

📝 Abstract

The cross-lingual transfer is a promising technique to solve tasks in less-resourced languages. In this empirical study, we compare two fine-tuning approaches combined with zero-shot and full-shot learning approaches for large language models in a cross-lingual setting. As fine-tuning strategies, we compare parameter-efficient adapter methods with fine-tuning of all parameters. As cross-lingual transfer strategies, we compare the intermediate-training ( extit{IT}) that uses each language sequentially and cross-lingual validation ( extit{CLV}) that uses a target language already in the validation phase of fine-tuning. We assess the success of transfer and the extent of catastrophic forgetting in a source language due to cross-lingual transfer, i.e., how much previously acquired knowledge is lost when we learn new information in a different language. The results on two different classification problems, hate speech detection and product reviews, each containing datasets in several languages, show that the extit{IT} cross-lingual strategy outperforms extit{CLV} for the target language. Our findings indicate that, in the majority of cases, the extit{CLV} strategy demonstrates superior retention of knowledge in the base language (English) compared to the extit{IT} strategy, when evaluating catastrophic forgetting in multiple cross-lingual transfers.

Problem

Research questions and friction points this paper is trying to address.

Measure catastrophic forgetting in cross-lingual transfer

Compare fine-tuning strategies for language models

Evaluate knowledge retention across multiple languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient adapter methods

Intermediate-training cross-lingual strategy

Cross-lingual validation for retention

🔎 Similar Papers

No similar papers found.