Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation

📅 2024-11-29

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing social media sensitive content detection tools suffer from limited customizability, narrow category coverage—particularly lacking long-tail classes such as drug-related and self-harm content—high privacy risks, and the absence of a unified evaluation benchmark. To address these issues, this work introduces the first high-quality, uniformly annotated dataset covering six sensitive content categories: conflict language, abuse, pornography, drug-related content, self-harm, and spam. We establish standardized protocols for data collection and human annotation. Leveraging this dataset, we supervise fine-tuning of open-source large language models (e.g., LLaMA) and design a comprehensive, multi-dimensional evaluation benchmark. Experimental results demonstrate that our approach consistently outperforms both the LLaMA baseline and the OpenAI API across all six detection tasks, achieving average improvements of 10–15%. Gains are especially pronounced for scarce categories (e.g., drug-related and self-harm content), validating the effectiveness and deployability of open-source LLM fine-tuning for fine-grained sensitive content identification.

Technology Category

Application Category

📝 Abstract

The detection of sensitive content in large datasets is crucial for ensuring that shared and analysed data is free from harmful material. However, current moderation tools, such as external APIs, suffer from limitations in customisation, accuracy across diverse sensitive categories, and privacy concerns. Additionally, existing datasets and open-source models focus predominantly on toxic language, leaving gaps in detecting other sensitive categories such as substance abuse or self-harm. In this paper, we put forward a unified dataset tailored for social media content moderation across six sensitive categories: conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. By collecting and annotating data with consistent retrieval strategies and guidelines, we address the shortcomings of previous focalised research. Our analysis demonstrates that fine-tuning large language models (LLMs) on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models such as LLaMA, and even proprietary OpenAI models, which underperform by 10-15% overall. This limitation is even more pronounced on popular moderation APIs, which cannot be easily tailored to specific sensitive content categories, among others.

Problem

Research questions and friction points this paper is trying to address.

Detecting diverse sensitive content in social media data

Addressing limitations in current moderation tools and datasets

Improving accuracy across six key sensitive content categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified dataset for six sensitive content categories

Fine-tuning LLMs improves detection performance significantly

Consistent retrieval strategies address previous research gaps

🔎 Similar Papers

No similar papers found.