Towards Safer Social Media Platforms: Scalable and Performant Few-Shot Harmful Content Moderation Using Large Language Models

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of delayed content moderation caused by the dynamic evolution of harmful social media content (e.g., violent acts, dangerous challenges) and the scarcity of labeled training data, this paper proposes a few-shot dynamic content moderation method leveraging large language models (LLaMA/GPT). The approach integrates in-context learning (ICL) with multimodal feature alignment to enable robust cross-modal understanding. It is the first work to systematically demonstrate the superiority of LLMs in few-shot detection across multiple categories of harmful content. Crucially, it innovatively incorporates visual cues—such as video thumbnails—into the textual reasoning pipeline, significantly enhancing multimodal robustness. Evaluated on multiple benchmarks, the method achieves a 12.6% improvement in F1-score over state-of-the-art baselines—including Perspective API and OpenAI’s moderation API—while maintaining low inference latency suitable for real-time deployment. Moreover, it enables rapid adaptation to emerging risk patterns without requiring large-scale annotated datasets.

Technology Category

Application Category

📝 Abstract
The prevalence of harmful content on social media platforms poses significant risks to users and society, necessitating more effective and scalable content moderation strategies. Current approaches rely on human moderators, supervised classifiers, and large volumes of training data, and often struggle with scalability, subjectivity, and the dynamic nature of harmful content (e.g., violent content, dangerous challenge trends, etc.). To bridge these gaps, we utilize Large Language Models (LLMs) to undertake few-shot dynamic content moderation via in-context learning. Through extensive experiments on multiple LLMs, we demonstrate that our few-shot approaches can outperform existing proprietary baselines (Perspective and OpenAI Moderation) as well as prior state-of-the-art few-shot learning methods, in identifying harm. We also incorporate visual information (video thumbnails) and assess if different multimodal techniques improve model performance. Our results underscore the significant benefits of employing LLM based methods for scalable and dynamic harmful content moderation online.
Problem

Research questions and friction points this paper is trying to address.

Social Media
Harmful Content
Detection and Management
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Harmful Content Detection
Multimodal Information Integration
🔎 Similar Papers
No similar papers found.