Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation

📅 2024-11-29
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing social media sensitive content detection tools suffer from limited customizability, narrow category coverage—particularly lacking long-tail classes such as drug-related and self-harm content—high privacy risks, and the absence of a unified evaluation benchmark. To address these issues, this work introduces the first high-quality, uniformly annotated dataset covering six sensitive content categories: conflict language, abuse, pornography, drug-related content, self-harm, and spam. We establish standardized protocols for data collection and human annotation. Leveraging this dataset, we supervise fine-tuning of open-source large language models (e.g., LLaMA) and design a comprehensive, multi-dimensional evaluation benchmark. Experimental results demonstrate that our approach consistently outperforms both the LLaMA baseline and the OpenAI API across all six detection tasks, achieving average improvements of 10–15%. Gains are especially pronounced for scarce categories (e.g., drug-related and self-harm content), validating the effectiveness and deployability of open-source LLM fine-tuning for fine-grained sensitive content identification.

Technology Category

Application Category

📝 Abstract
The detection of sensitive content in large datasets is crucial for ensuring that shared and analysed data is free from harmful material. However, current moderation tools, such as external APIs, suffer from limitations in customisation, accuracy across diverse sensitive categories, and privacy concerns. Additionally, existing datasets and open-source models focus predominantly on toxic language, leaving gaps in detecting other sensitive categories such as substance abuse or self-harm. In this paper, we put forward a unified dataset tailored for social media content moderation across six sensitive categories: conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. By collecting and annotating data with consistent retrieval strategies and guidelines, we address the shortcomings of previous focalised research. Our analysis demonstrates that fine-tuning large language models (LLMs) on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models such as LLaMA, and even proprietary OpenAI models, which underperform by 10-15% overall. This limitation is even more pronounced on popular moderation APIs, which cannot be easily tailored to specific sensitive content categories, among others.
Problem

Research questions and friction points this paper is trying to address.

Detecting diverse sensitive content in social media data
Addressing limitations in current moderation tools and datasets
Improving accuracy across six key sensitive content categories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified dataset for six sensitive content categories
Fine-tuning LLMs improves detection performance significantly
Consistent retrieval strategies address previous research gaps
🔎 Similar Papers
No similar papers found.
D
Dimosthenis Antypas
Cardiff NLP, Cardiff University, United Kingdom
Indira Sen
Indira Sen
University of Mannheim
computational social sciencesocial computingnatural language processing
C
Carla Pérez-Almendros
Cardiff NLP, Cardiff University, United Kingdom
J
J. Camacho-Collados
Cardiff NLP, Cardiff University, United Kingdom
Francesco Barbieri
Francesco Barbieri
Meta
GenAINLP