When Does Language Transfer Help? Sequential Fine-Tuning for Cross-Lingual Euphemism Detection

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the impact of cross-lingual transfer on euphemism detection for low-resource languages—specifically English, Spanish, Chinese, Turkish, and Yoruba. Addressing the modeling challenges posed by cultural variation and semantic ambiguity in euphemisms, we propose a sequential fine-tuning strategy: transferring knowledge progressively from high-resource (e.g., English) to low-resource languages. Using XLM-R and mBERT, we systematically compare monolingual fine-tuning, simultaneous multilingual fine-tuning, and sequential fine-tuning, analyzing the effects of language typology, pretraining coverage, and transfer paths. Results show that sequential fine-tuning significantly improves performance on low-resource languages—especially Yoruba—revealing pretraining data disparity as a key bottleneck. While XLM-R yields larger gains, it is more susceptible to catastrophic forgetting; mBERT demonstrates greater robustness. This study establishes an interpretable, reproducible transfer paradigm for implicit semantic understanding in low-resource settings.

Technology Category

Application Category

📝 Abstract
Euphemisms are culturally variable and often ambiguous, posing challenges for language models, especially in low-resource settings. This paper investigates how cross-lingual transfer via sequential fine-tuning affects euphemism detection across five languages: English, Spanish, Chinese, Turkish, and Yoruba. We compare sequential fine-tuning with monolingual and simultaneous fine-tuning using XLM-R and mBERT, analyzing how performance is shaped by language pairings, typological features, and pretraining coverage. Results show that sequential fine-tuning with a high-resource L1 improves L2 performance, especially for low-resource languages like Yoruba and Turkish. XLM-R achieves larger gains but is more sensitive to pretraining gaps and catastrophic forgetting, while mBERT yields more stable, though lower, results. These findings highlight sequential fine-tuning as a simple yet effective strategy for improving euphemism detection in multilingual models, particularly when low-resource languages are involved.
Problem

Research questions and friction points this paper is trying to address.

Cross-lingual euphemism detection across five languages
Evaluating sequential fine-tuning versus monolingual approaches
Analyzing performance impact of language resources and typology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequential fine-tuning for cross-lingual transfer
XLM-R and mBERT models comparison analysis
High-resource L1 improves low-resource L2 performance
🔎 Similar Papers
No similar papers found.