Omni-Fake: Benchmarking Unified Multimodal Social Media Deepfake Detection

📅 2026-05-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

241K/year
🤖 AI Summary
Current deepfake detection benchmarks are often limited to single-modality settings or unrealistic data distributions, making them inadequate for evaluating robustness in real-world social media scenarios. To address this gap, this work proposes Omni-Fake—the first unified multimodal deepfake benchmark tailored for social media, encompassing images, audio, video, and audio-visual talking-head content. It includes over one million high-quality samples and an out-of-distribution test set, supporting an integrated evaluation protocol for detection, localization, and explainability. Additionally, we introduce Omni-Fake-R1, a reinforcement learning–based multimodal detector that adaptively fuses audio-visual cues to produce structured decisions accompanied by natural language explanations. Experiments demonstrate that our approach significantly outperforms state-of-the-art methods in detection accuracy, cross-modal generalization, and interpretability.
📝 Abstract
Multimodal deepfakes are proliferating on social media and threaten authenticity, information integrity, and digital forensics. Existing benchmarks are constrained by their single-modality scope, simplified manipulations, or unrealistic distributions, which limit their ability to assess real-world robustness. To address these limitations, we present Omni-Fake, a unified omni-dataset for comprehensive multimodal deepfake detection in social-media settings. It comprises Omni-Fake-Set, a large-scale, high-quality dataset with 1M+ samples, and Omni-Fake-OOD, an out-of-distribution benchmark with 200k+ samples intentionally excluded from training to evaluate generalization. Omni-Fake spans four modalities (image, audio, video, and audio-video talking head) and supports a joint detection-localization-explanation protocol. On top of Omni-Fake, we further propose Omni-Fake-R1, a reinforcement-learning-driven multimodal detector that adaptively integrates visual and auditory cues and outputs structured decisions, localization, and natural-language explanations. Extensive experiments show significant gains in detection accuracy, cross-modal generalization, and explainability over state-of-the-art baselines. Project page: https://tianxiao1201.github.io/omni-fake-project-page/
Problem

Research questions and friction points this paper is trying to address.

multimodal deepfake
social media
deepfake detection
benchmark
out-of-distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal deepfake detection
out-of-distribution generalization
reinforcement learning
explainable AI
social media forensics
🔎 Similar Papers
No similar papers found.