GenWorld: Towards Detecting AI-generated Real-world Simulation Videos

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The rapid advancement of AI-generated video quality poses a severe threat to visual content authenticity; however, the scarcity of high-fidelity, real-world forgery video datasets hinders the development of robust detection methods. Method: We introduce GenWorld—the first benchmark dataset tailored to realistic simulation scenarios—comprising videos synthesized by multiple state-of-the-art generative models (including world models such as Cosmos) and cross-modal prompt-driven, high-fidelity forgeries. We further propose SpannDetector, a physically interpretable multi-view consistency detector that pioneers a detection paradigm grounded in spatiotemporal physical plausibility, integrating multi-view feature modeling with a lightweight aggregation network. Contribution/Results: Extensive experiments demonstrate that SpannDetector significantly outperforms existing methods on GenWorld, particularly achieving substantial gains in detecting world-model-generated videos. These results validate the effectiveness and generalizability of physics-guided detection.

Technology Category

Application Category

📝 Abstract
The flourishing of video generation technologies has endangered the credibility of real-world information and intensified the demand for AI-generated video detectors. Despite some progress, the lack of high-quality real-world datasets hinders the development of trustworthy detectors. In this paper, we propose GenWorld, a large-scale, high-quality, and real-world simulation dataset for AI-generated video detection. GenWorld features the following characteristics: (1) Real-world Simulation: GenWorld focuses on videos that replicate real-world scenarios, which have a significant impact due to their realism and potential influence; (2) High Quality: GenWorld employs multiple state-of-the-art video generation models to provide realistic and high-quality forged videos; (3) Cross-prompt Diversity: GenWorld includes videos generated from diverse generators and various prompt modalities (e.g., text, image, video), offering the potential to learn more generalizable forensic features. We analyze existing methods and find they fail to detect high-quality videos generated by world models (i.e., Cosmos), revealing potential drawbacks of ignoring real-world clues. To address this, we propose a simple yet effective model, SpannDetector, to leverage multi-view consistency as a strong criterion for real-world AI-generated video detection. Experiments show that our method achieves superior results, highlighting a promising direction for explainable AI-generated video detection based on physical plausibility. We believe that GenWorld will advance the field of AI-generated video detection. Project Page: https://chen-wl20.github.io/GenWorld
Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality real-world datasets for AI-generated video detection
Existing methods fail to detect high-quality videos from world models
Need for explainable detection based on physical plausibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale real-world simulation dataset
Multi-view consistency detection model
Diverse generator and prompt modalities
🔎 Similar Papers
No similar papers found.
Weiliang Chen
Weiliang Chen
Alibaba
AI SystemDeep Learning
Wenzhao Zheng
Wenzhao Zheng
EECS, University of California, Berkeley
Large ModelsEmbodied AgentsAutonomous Driving
Y
Yu Zheng
Department of Automation, Tsinghua University, China
L
Lei Chen
Department of Automation, Tsinghua University, China
J
Jie Zhou
Department of Automation, Tsinghua University, China
J
Jiwen Lu
Department of Automation, Tsinghua University, China
Y
Yueqi Duan
Department of Automation, Tsinghua University, China