Omni-Fake: Benchmarking Unified Multimodal Social Media Deepfake Detection

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Current deepfake detection benchmarks are often limited to single-modality settings or unrealistic data distributions, making them inadequate for evaluating robustness in real-world social media scenarios. To address this gap, this work proposes Omni-Fake—the first unified multimodal deepfake benchmark tailored for social media, encompassing images, audio, video, and audio-visual talking-head content. It includes over one million high-quality samples and an out-of-distribution test set, supporting an integrated evaluation protocol for detection, localization, and explainability. Additionally, we introduce Omni-Fake-R1, a reinforcement learning–based multimodal detector that adaptively fuses audio-visual cues to produce structured decisions accompanied by natural language explanations. Experiments demonstrate that our approach significantly outperforms state-of-the-art methods in detection accuracy, cross-modal generalization, and interpretability.

📝 Abstract

Multimodal deepfakes are proliferating on social media and threaten authenticity, information integrity, and digital forensics. Existing benchmarks are constrained by their single-modality scope, simplified manipulations, or unrealistic distributions, which limit their ability to assess real-world robustness. To address these limitations, we present Omni-Fake, a unified omni-dataset for comprehensive multimodal deepfake detection in social-media settings. It comprises Omni-Fake-Set, a large-scale, high-quality dataset with 1M+ samples, and Omni-Fake-OOD, an out-of-distribution benchmark with 200k+ samples intentionally excluded from training to evaluate generalization. Omni-Fake spans four modalities (image, audio, video, and audio-video talking head) and supports a joint detection-localization-explanation protocol. On top of Omni-Fake, we further propose Omni-Fake-R1, a reinforcement-learning-driven multimodal detector that adaptively integrates visual and auditory cues and outputs structured decisions, localization, and natural-language explanations. Extensive experiments show significant gains in detection accuracy, cross-modal generalization, and explainability over state-of-the-art baselines. Project page: https://tianxiao1201.github.io/omni-fake-project-page/

Problem

Research questions and friction points this paper is trying to address.

multimodal deepfake

social media

deepfake detection

benchmark

out-of-distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal deepfake detection

out-of-distribution generalization

reinforcement learning

explainable AI

social media forensics

🔎 Similar Papers

No similar papers found.