SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

📅 2024-12-05

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

253K/year

🤖 AI Summary

To address the proliferation of highly realistic AI-generated and manipulated images on social platforms—exacerbating misinformation dissemination and eroding digital content trustworthiness—this paper introduces SID-Set, the first large-scale deepfake detection benchmark (300K images) tailored to authentic social-media scenarios, bridging the gap in high-diversity, high-fidelity datasets. We propose SIDA, a novel multitask unified framework that jointly performs deepfake image detection, pixel-level tampering localization, and natural-language interpretability generation—the first method to achieve all three tasks simultaneously. Leveraging a large multimodal model, SIDA integrates a vision encoder, mask decoder, and instruction-tuned language module, enabling end-to-end joint training. Extensive experiments on SID-Set and multiple public benchmarks demonstrate that SIDA significantly outperforms state-of-the-art methods in detection accuracy, localization IoU, and explanation plausibility. The code, models, and dataset are fully open-sourced.

Technology Category

Application Category

📝 Abstract

The rapid advancement of generative models in creating highly realistic images poses substantial risks for misinformation dissemination. For instance, a synthetic image, when shared on social media, can mislead extensive audiences and erode trust in digital content, resulting in severe repercussions. Despite some progress, academia has not yet created a large and diversified deepfake detection dataset for social media, nor has it devised an effective solution to address this issue. In this paper, we introduce the Social media Image Detection dataSet (SID-Set), which offers three key advantages: (1) extensive volume, featuring 300K AI-generated/tampered and authentic images with comprehensive annotations, (2) broad diversity, encompassing fully synthetic and tampered images across various classes, and (3) elevated realism, with images that are predominantly indistinguishable from genuine ones through mere visual inspection. Furthermore, leveraging the exceptional capabilities of large multimodal models, we propose a new image deepfake detection, localization, and explanation framework, named SIDA (Social media Image Detection, localization, and explanation Assistant). SIDA not only discerns the authenticity of images, but also delineates tampered regions through mask prediction and provides textual explanations of the model's judgment criteria. Compared with state-of-the-art deepfake detection models on SID-Set and other benchmarks, extensive experiments demonstrate that SIDA achieves superior performance among diversified settings. The code, model, and dataset will be released.

Problem

Research questions and friction points this paper is trying to address.

Detect and localize deepfake images on social media.

Provide explanations for deepfake detection model judgments.

Create a large, diverse dataset for deepfake detection.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large multimodal model for deepfake detection

SID-Set: 300K annotated social media images

Mask prediction and textual explanation features

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

2024-10-03arXiv.orgCitations: 14