Culture Matters in Toxic Language Detection in Persian

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study investigates the impact of cultural background on Persian toxic language detection, emphasizing the critical role of cultural similarity and dissimilarity in cross-lingual transfer. We systematically evaluate fine-tuning, zero-shot/few-shot prompting, cross-lingual pretraining (mBERT, XLM-R), and culture-aware corpus filtering on a Persian toxic speech dataset. Our experiments reveal, for the first time, that transfer performance correlates inversely with cultural distance between source and target languages: culturally proximate Arabic yields a 12.4% F1 improvement over culturally distant English. These findings demonstrate that cultural factors constitute a fundamental, previously underappreciated variable in NLP-based toxicity detection. The work establishes a culture-sensitive paradigm for cross-lingual toxicity modeling, advancing both theoretical understanding and practical deployment in linguistically and culturally diverse settings.

Technology Category

Application Category

📝 Abstract

Toxic language detection is crucial for creating safer online environments and limiting the spread of harmful content. While toxic language detection has been under-explored in Persian, the current work compares different methods for this task, including fine-tuning, data enrichment, zero-shot and few-shot learning, and cross-lingual transfer learning. What is especially compelling is the impact of cultural context on transfer learning for this task: We show that the language of a country with cultural similarities to Persian yields better results in transfer learning. Conversely, the improvement is lower when the language comes from a culturally distinct country. Warning: This paper contains examples of toxic language that may disturb some readers. These examples are included for the purpose of research on toxic detection.

Problem

Research questions and friction points this paper is trying to address.

Exploring toxic language detection methods in Persian

Investigating cultural context impact on transfer learning

Comparing cross-lingual approaches for harmful content detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning and data enrichment for Persian toxicity detection

Zero-shot and few-shot learning for low-resource languages

Cross-lingual transfer learning with cultural context consideration

🔎 Similar Papers

No similar papers found.