BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

The proliferation of AI-generated videos has triggered a severe trust crisis, yet existing detection methods suffer from small, low-quality datasets and rely on opaque binary classifiers lacking interpretability. To address this, we introduce GenBuster-200K—the first large-scale (200K-sample), high-fidelity dataset of real-world AI-generated videos. We further propose BusterX, the first interpretable detection framework integrating multimodal large language models (MLLMs) with reinforcement learning, moving beyond binary classification to support natural-language-based attribution and decision provenance. BusterX jointly models spatiotemporal video features, incorporates an explainable reasoning mechanism, and leverages high-fidelity synthetic data augmentation. Extensive experiments demonstrate that BusterX significantly outperforms state-of-the-art methods across multiple benchmarks, exhibiting strong generalization and robustness under diverse distribution shifts. All code, models, and the GenBuster-200K dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Advances in AI generative models facilitate super-realistic video synthesis, amplifying misinformation risks via social media and eroding trust in digital content. Several research works have explored new deepfake detection methods on AI-generated images to alleviate these risks. However, with the fast development of video generation models, such as Sora and WanX, there is currently a lack of large-scale, high-quality AI-generated video datasets for forgery detection. In addition, existing detection approaches predominantly treat the task as binary classification, lacking explainability in model decision-making and failing to provide actionable insights or guidance for the public. To address these challenges, we propose extbf{GenBuster-200K}, a large-scale AI-generated video dataset featuring 200K high-resolution video clips, diverse latest generative techniques, and real-world scenes. We further introduce extbf{BusterX}, a novel AI-generated video detection and explanation framework leveraging multimodal large language model (MLLM) and reinforcement learning for authenticity determination and explainable rationale. To our knowledge, GenBuster-200K is the {it extbf{first}} large-scale, high-quality AI-generated video dataset that incorporates the latest generative techniques for real-world scenarios. BusterX is the {it extbf{first}} framework to integrate MLLM with reinforcement learning for explainable AI-generated video detection. Extensive comparisons with state-of-the-art methods and ablation studies validate the effectiveness and generalizability of BusterX. The code, models, and datasets will be released.

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale AI-generated video datasets for detection

Existing methods lack explainability in model decision-making

Need for actionable insights in AI video forgery detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale AI-generated video dataset GenBuster-200K

Multimodal large language model (MLLM) for detection

Reinforcement learning for explainable rationale

🔎 Similar Papers

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

2024-10-03arXiv.orgCitations: 14