When Does Language Transfer Help? Sequential Fine-Tuning for Cross-Lingual Euphemism Detection

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work investigates the impact of cross-lingual transfer on euphemism detection for low-resource languages—specifically English, Spanish, Chinese, Turkish, and Yoruba. Addressing the modeling challenges posed by cultural variation and semantic ambiguity in euphemisms, we propose a sequential fine-tuning strategy: transferring knowledge progressively from high-resource (e.g., English) to low-resource languages. Using XLM-R and mBERT, we systematically compare monolingual fine-tuning, simultaneous multilingual fine-tuning, and sequential fine-tuning, analyzing the effects of language typology, pretraining coverage, and transfer paths. Results show that sequential fine-tuning significantly improves performance on low-resource languages—especially Yoruba—revealing pretraining data disparity as a key bottleneck. While XLM-R yields larger gains, it is more susceptible to catastrophic forgetting; mBERT demonstrates greater robustness. This study establishes an interpretable, reproducible transfer paradigm for implicit semantic understanding in low-resource settings.

Technology Category

Application Category

📝 Abstract

Euphemisms are culturally variable and often ambiguous, posing challenges for language models, especially in low-resource settings. This paper investigates how cross-lingual transfer via sequential fine-tuning affects euphemism detection across five languages: English, Spanish, Chinese, Turkish, and Yoruba. We compare sequential fine-tuning with monolingual and simultaneous fine-tuning using XLM-R and mBERT, analyzing how performance is shaped by language pairings, typological features, and pretraining coverage. Results show that sequential fine-tuning with a high-resource L1 improves L2 performance, especially for low-resource languages like Yoruba and Turkish. XLM-R achieves larger gains but is more sensitive to pretraining gaps and catastrophic forgetting, while mBERT yields more stable, though lower, results. These findings highlight sequential fine-tuning as a simple yet effective strategy for improving euphemism detection in multilingual models, particularly when low-resource languages are involved.

Problem

Research questions and friction points this paper is trying to address.

Cross-lingual euphemism detection across five languages

Evaluating sequential fine-tuning versus monolingual approaches

Analyzing performance impact of language resources and typology

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential fine-tuning for cross-lingual transfer

XLM-R and mBERT models comparison analysis

High-resource L1 improves low-resource L2 performance

🔎 Similar Papers

Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges