AI-Generated Video Detection via Perceptual Straightening

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

The proliferation of highly realistic videos generated by generative AI poses significant challenges for forensic detection, as existing methods suffer from limited generalization and inadequate modeling of temporal inconsistencies. Method: We propose a novel video authenticity verification paradigm grounded in the geometric properties of neural representations: we observe that authentic videos exhibit more “linearized” temporal trajectories in the representation space of a pretrained Vision Transformer (DINOv2), leading to our “perceptual straightening” hypothesis. We formally characterize this geometric distinction using temporal curvature and step-wise distance metrics—introduced here for the first time—and design a lightweight pipeline that extracts frame-level features, computes statistical geometric descriptors, and feeds them into a compact classifier. Contribution/Results: Our method achieves 97.17% accuracy and 98.63% AUROC on the VidProM benchmark, substantially outperforming state-of-the-art approaches while offering superior generalization, high precision, and low computational overhead.

Technology Category

Application Category

📝 Abstract

The rapid advancement of generative AI enables highly realistic synthetic videos, posing significant challenges for content authentication and raising urgent concerns about misuse. Existing detection methods often struggle with generalization and capturing subtle temporal inconsistencies. We propose ReStraV(Representation Straightening Video), a novel approach to distinguish natural from AI-generated videos. Inspired by the "perceptual straightening" hypothesis -- which suggests real-world video trajectories become more straight in neural representation domain -- we analyze deviations from this expected geometric property. Using a pre-trained self-supervised vision transformer (DINOv2), we quantify the temporal curvature and stepwise distance in the model's representation domain. We aggregate statistics of these measures for each video and train a classifier. Our analysis shows that AI-generated videos exhibit significantly different curvature and distance patterns compared to real videos. A lightweight classifier achieves state-of-the-art detection performance (e.g., 97.17% accuracy and 98.63% AUROC on the VidProM benchmark), substantially outperforming existing image- and video-based methods. ReStraV is computationally efficient, it is offering a low-cost and effective detection solution. This work provides new insights into using neural representation geometry for AI-generated video detection.

Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated videos via perceptual straightening deviations

Addressing generalization issues in synthetic video detection methods

Quantifying temporal inconsistencies using neural representation geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses perceptual straightening hypothesis for detection

Analyzes temporal curvature with DINOv2 model

Lightweight classifier achieves high accuracy

🔎 Similar Papers

What Matters in Detecting AI-Generated Videos like Sora?

2024-06-27arXiv.orgCitations: 12

Detecting AI-Generated Video via Frame Consistency

2024-02-03Citations: 1