Cross-lingual Offensive Language Detection: A Systematic Review of Datasets, Transfer Approaches and Challenges

📅 2024-01-17
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of cross-lingual offensive language detection in social media. We systematically review 67 studies, with the first comprehensive focus on cross-lingual transfer learning (CLTL) methods for this task. Methodologically, we propose a holistic classification framework tailored to CLTL, innovatively categorizing approaches by “transfer object” into three paradigms: instance-level, feature-level, and parameter-level transfer. We construct and publicly release two structured resource tables, comprehensively cataloging multilingual pre-trained models, dictionary-based alignment techniques, zero-/few-shot transfer strategies, and adversarial training methods. Key challenges—including linguistic imbalance, annotation scarcity, and cultural context deficiency—are distilled and analyzed. The work yields a reusable research roadmap and an open resource repository, providing both theoretical foundations and practical tools for cross-lingual harmful content governance.

Technology Category

Application Category

📝 Abstract
The growing prevalence and rapid evolution of offensive language in social media amplify the complexities of detection, particularly highlighting the challenges in identifying such content across diverse languages. This survey presents a systematic and comprehensive exploration of Cross-Lingual Transfer Learning (CLTL) techniques in offensive language detection in social media. Our study stands as the first holistic overview to focus exclusively on the cross-lingual scenario in this domain. We analyse 67 relevant papers and categorise these studies across various dimensions, including the characteristics of multilingual datasets used, the cross-lingual resources employed, and the specific CLTL strategies implemented. According to"what to transfer", we also summarise three main CLTL transfer approaches: instance, feature, and parameter transfer. Additionally, we shed light on the current challenges and future research opportunities in this field. Furthermore, we have made our survey resources available online, including two comprehensive tables that provide accessible references to the multilingual datasets and CLTL methods used in the reviewed literature.
Problem

Research questions and friction points this paper is trying to address.

Detecting offensive language across diverse languages
Systematically reviewing cross-lingual transfer learning techniques
Addressing challenges in multilingual offensive content identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual transfer learning techniques
Systematic analysis of 67 relevant papers
Three main transfer approaches categorization
🔎 Similar Papers
No similar papers found.