MFFI: Multi-Dimensional Face Forgery Image Dataset for Real-World Scenarios

📅 2025-09-06

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing Deepfake detection methods suffer from insufficient dataset diversity, limiting their robustness against unseen forgery techniques, variable facial scenes, scarce authentic samples, and transmission-induced degradation. To address these real-world challenges, we propose a four-dimensional augmentation strategy: (1) covering 50 distinct forgery algorithms; (2) synthesizing diverse facial scenes (e.g., pose, illumination, occlusion); (3) enriching the authentic face distribution via large-scale real-world collection and synthesis; and (4) simulating multi-level transmission degradation—including compression, resolution reduction, and noise injection. This yields FaceForge, a large-scale, multidimensional Deepfake dataset comprising 1.024 million images. FaceForge significantly increases scene complexity, cross-domain generalizability, and detection difficulty gradients. In comprehensive benchmarking, it consistently outperforms all existing public datasets. By providing rich, realistic, and systematically varied data, FaceForge establishes critical infrastructure for developing robust and generalizable Deepfake detection models.

Technology Category

Application Category

📝 Abstract

Rapid advances in Artificial Intelligence Generated Content (AIGC) have enabled increasingly sophisticated face forgeries, posing a significant threat to social security. However, current Deepfake detection methods are limited by constraints in existing datasets, which lack the diversity necessary in real-world scenarios. Specifically, these data sets fall short in four key areas: unknown of advanced forgery techniques, variability of facial scenes, richness of real data, and degradation of real-world propagation. To address these challenges, we propose the Multi-dimensional Face Forgery Image ( extbf{MFFI}) dataset, tailored for real-world scenarios. MFFI enhances realism based on four strategic dimensions: 1) Wider Forgery Methods; 2) Varied Facial Scenes; 3) Diversified Authentic Data; 4) Multi-level Degradation Operations. MFFI integrates $50$ different forgery methods and contains $1024K$ image samples. Benchmark evaluations show that MFFI outperforms existing public datasets in terms of scene complexity, cross-domain generalization capability, and detection difficulty gradients. These results validate the technical advance and practical utility of MFFI in simulating real-world conditions. The dataset and additional details are publicly available at {https://github.com/inclusionConf/MFFI}.

Problem

Research questions and friction points this paper is trying to address.

Addressing lack of diversity in existing Deepfake detection datasets

Overcoming limitations in real-world forgery technique representation

Improving detection robustness against varied facial scenes and degradations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates 50 diverse forgery methods

Enhances realism with multi-dimensional strategic dimensions

Contains 1024K samples with multi-level degradation operations

🔎 Similar Papers

Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method

2024-05-14arXiv.orgCitations: 1

Roblox

$195,780—$242,100 USD

San Mateo, CA, USA

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)