Human Action CLIPS: Detecting AI-generated Human Motion

📅 2024-11-30

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 2

career value

226K/year

🤖 AI Summary

To address the growing security risks posed by AI-generated human motion videos, this paper proposes a multimodal semantic embedding-based detection method that overcomes the limitations of conventional approaches relying on low-level visual cues (e.g., optical flow, texture). The method introduces a novel cross-modal discriminative paradigm operating at the human motion semantic level, integrating contrastive learning with temporal action modeling to achieve separability between real and synthetic videos in semantic embedding space. It further exhibits strong robustness against post-processing “whitewashing” attacks. We construct a dedicated benchmark dataset covering seven mainstream text-to-video diffusion models and evaluate our method on this new benchmark. Experimental results demonstrate significant performance gains over existing state-of-the-art methods—achieving high accuracy and superior cross-model generalization capability.

Technology Category

Application Category

📝 Abstract

Full-blown AI-generated video generation continues its journey through the uncanny valley to produce content that is perceptually indistinguishable from reality. Intermixed with many exciting and creative applications are malicious applications that harm individuals, organizations, and democracies. We describe an effective and robust technique for distinguishing real from AI-generated human motion. This technique leverages a multi-modal semantic embedding, making it robust to the types of laundering that typically confound more low- to mid-level approaches. This method is evaluated against a custom-built dataset of video clips with human actions generated by seven text-to-video AI models and matching real footage.

Problem

Research questions and friction points this paper is trying to address.

Detect AI-generated human motion in videos

Distinguish real from synthetic motion robustly

Counteract resolution and compression laundering attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-modal semantic embeddings

Robust against resolution and compression

Evaluated with custom DeepAction dataset

🔎 Similar Papers

No similar papers found.