🤖 AI Summary
To address the challenges of detecting and mitigating hateful content in images, this paper introduces DeHater—the first multimodal de-hatred framework integrating Stable Diffusion with vision-language prompting. Methodologically, we construct the DeHate multimodal dataset, design a Digital Attention Analysis Module (DAAM) to generate fine-grained hate attention heatmaps, and incorporate watermark-augmented text-guided diffusion for precise localization and semantically consistent inpainting of hateful regions. Our contributions include: (1) releasing DeHate—the first benchmark dataset and open-source model (DeHater) dedicated to image-based hate governance; (2) proposing an interpretable, attention-driven mechanism enabling text-instruction-controlled editing; and (3) achieving state-of-the-art performance across multiple metrics, establishing a novel paradigm for ethically grounded AI-driven content moderation on social media.
📝 Abstract
The rise in harmful online content not only distorts public discourse but also poses significant challenges to maintaining a healthy digital environment. In response to this, we introduce a multimodal dataset uniquely crafted for identifying hate in digital content. Central to our methodology is the innovative application of watermarked, stability-enhanced, stable diffusion techniques combined with the Digital Attention Analysis Module (DAAM). This combination is instrumental in pinpointing the hateful elements within images, thereby generating detailed hate attention maps, which are used to blur these regions from the image, thereby removing the hateful sections of the image. We release this data set as a part of the dehate shared task. This paper also describes the details of the shared task. Furthermore, we present DeHater, a vision-language model designed for multimodal dehatification tasks. Our approach sets a new standard in AI-driven image hate detection given textual prompts, contributing to the development of more ethical AI applications in social media.