Human Action CLIPS: Detecting AI-generated Human Motion

📅 2024-11-30
🏛️ arXiv.org
📈 Citations: 4
Influential: 2
📄 PDF
🤖 AI Summary
To address the growing security risks posed by AI-generated human motion videos, this paper proposes a multimodal semantic embedding-based detection method that overcomes the limitations of conventional approaches relying on low-level visual cues (e.g., optical flow, texture). The method introduces a novel cross-modal discriminative paradigm operating at the human motion semantic level, integrating contrastive learning with temporal action modeling to achieve separability between real and synthetic videos in semantic embedding space. It further exhibits strong robustness against post-processing “whitewashing” attacks. We construct a dedicated benchmark dataset covering seven mainstream text-to-video diffusion models and evaluate our method on this new benchmark. Experimental results demonstrate significant performance gains over existing state-of-the-art methods—achieving high accuracy and superior cross-model generalization capability.

Technology Category

Application Category

📝 Abstract
Full-blown AI-generated video generation continues its journey through the uncanny valley to produce content that is perceptually indistinguishable from reality. Intermixed with many exciting and creative applications are malicious applications that harm individuals, organizations, and democracies. We describe an effective and robust technique for distinguishing real from AI-generated human motion. This technique leverages a multi-modal semantic embedding, making it robust to the types of laundering that typically confound more low- to mid-level approaches. This method is evaluated against a custom-built dataset of video clips with human actions generated by seven text-to-video AI models and matching real footage.
Problem

Research questions and friction points this paper is trying to address.

Detect AI-generated human motion in videos
Distinguish real from synthetic motion robustly
Counteract resolution and compression laundering attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-modal semantic embeddings
Robust against resolution and compression
Evaluated with custom DeepAction dataset
🔎 Similar Papers
No similar papers found.
M
Matyáš Boháček
Google, Mountain View, California, USA; Stanford University, Stanford, California, USA
Hany Farid
Hany Farid
Professor, University of California, Berkeley
Media ForensicsDeepfakesGenerative AIForensic ScienceMisinformation