DDL: A Dataset for Interpretable Deepfake Detection and Localization in Real-World Scenarios

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Existing deepfake detection methods suffer from poor interpretability, while mainstream benchmarks exhibit limitations—including narrow scene coverage, limited forgery types, small scale, and coarse-grained annotations—hindering the development of interpretable detection for real-world applications. To address these issues, we introduce DDL, a large-scale, interpretable deepfake detection and localization dataset comprising over 1.8 million video samples, 75 distinct forgery algorithms, diverse scenes and manipulation patterns, and spatiotemporally fine-grained pixel-level forgery region annotations. DDL is the first benchmark to systematically integrate multi-dimensional diversity (scene, method, operation, and annotation granularity), significantly enhancing realism, challenge, and practical utility. As the largest currently available interpretable deepfake benchmark, DDL enables joint detection-localization modeling, attribution analysis, and high-assurance applications such as forensic investigation and judicial evidence verification.

Technology Category

Application Category

📝 Abstract

Recent advances in AIGC have exacerbated the misuse of malicious deepfake content, making the development of reliable deepfake detection methods an essential means to address this challenge. Although existing deepfake detection models demonstrate outstanding performance in detection metrics, most methods only provide simple binary classification results, lacking interpretability. In critical domains such as law, interpretability is crucial for enhancing the credibility and authority of decisions. Recent studies attempt to improve the interpretability of classification results by providing spatial manipulation masks or temporal forgery segments. However, the practical effectiveness of these methods remains suboptimal due to limitations of the forgery data. Most current deepfake datasets predominantly offer binary labels, only a few datasets with localization annotations. However, they suffer from restricted forgery scenarios, limited diversity in deepfake types, and insufficient data scale, making them inadequate for complex real-world scenarios. To address this predicament, we construct a novel large-scale deepfake detection and localization ($ extbf{DDL}$) dataset containing over $ extbf{1.8M}$ forged samples and encompassing up to $ extbf{75}$ distinct deepfake methods. The DDL design incorporates four key innovations: (1) $ extbf{Diverse Forgery Scenarios}$, (2) $ extbf{Comprehensive Deepfake Methods}$, (3) $ extbf{Varied Manipulation Modes}$, and (4) $ extbf{Fine-grained Forgery Annotations}$. Through these improvements, our DDL not only provides a more challenging benchmark for complex real-world forgeries, but also offers crucial support for building next-generation deepfake detection, localization, and interpretability methods. The DDL dataset project page is on https://deepfake-workshop-ijcai2025.github.io/main/index.html.

Problem

Research questions and friction points this paper is trying to address.

Lack of interpretability in current deepfake detection methods

Insufficient diversity and scale in existing deepfake datasets

Need for reliable localization in real-world deepfake scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset with 1.8M forged samples

Covers 75 distinct deepfake methods

Provides fine-grained forgery annotations

🔎 Similar Papers

No similar papers found.