DeHate: A Stable Diffusion-based Multimodal Approach to Mitigate Hate Speech in Images

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

To address the challenges of detecting and mitigating hateful content in images, this paper introduces DeHater—the first multimodal de-hatred framework integrating Stable Diffusion with vision-language prompting. Methodologically, we construct the DeHate multimodal dataset, design a Digital Attention Analysis Module (DAAM) to generate fine-grained hate attention heatmaps, and incorporate watermark-augmented text-guided diffusion for precise localization and semantically consistent inpainting of hateful regions. Our contributions include: (1) releasing DeHate—the first benchmark dataset and open-source model (DeHater) dedicated to image-based hate governance; (2) proposing an interpretable, attention-driven mechanism enabling text-instruction-controlled editing; and (3) achieving state-of-the-art performance across multiple metrics, establishing a novel paradigm for ethically grounded AI-driven content moderation on social media.

Technology Category

Application Category

📝 Abstract

The rise in harmful online content not only distorts public discourse but also poses significant challenges to maintaining a healthy digital environment. In response to this, we introduce a multimodal dataset uniquely crafted for identifying hate in digital content. Central to our methodology is the innovative application of watermarked, stability-enhanced, stable diffusion techniques combined with the Digital Attention Analysis Module (DAAM). This combination is instrumental in pinpointing the hateful elements within images, thereby generating detailed hate attention maps, which are used to blur these regions from the image, thereby removing the hateful sections of the image. We release this data set as a part of the dehate shared task. This paper also describes the details of the shared task. Furthermore, we present DeHater, a vision-language model designed for multimodal dehatification tasks. Our approach sets a new standard in AI-driven image hate detection given textual prompts, contributing to the development of more ethical AI applications in social media.

Problem

Research questions and friction points this paper is trying to address.

Detecting hate speech elements within digital images

Generating attention maps to identify hateful image regions

Developing AI models to remove hate content from images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stable diffusion techniques for hate detection

Digital Attention Analysis Module generates hate maps

Vision-language model performs multimodal dehatification tasks

🔎 Similar Papers

HateSieve: A Contrastive Learning Framework for Detecting and Segmenting Hateful Content in Multimodal Memes