SAGA: Source Attribution of Generative AI Videos

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the growing challenge of tracing AI-generated videos amid their proliferation, this paper introduces the first large-scale, fine-grained video attribution framework, supporting five-level溯源: authenticity, generation task, model architecture, version, and developer team. Methodologically, we propose Temporal Signature Maps (T-Sigs), a novel time-aware feature visualization technique that uncovers discriminative spatiotemporal artifacts unique to different generators. We extract robust spatiotemporal features using a video Transformer augmented with a resilient vision foundation model. Our framework adopts a pretraining-plus-lightweight-attribution paradigm, achieving full-supervision-level performance with only 0.5% labeled data, further enhanced by cross-domain adaptive learning for improved generalization. Extensive experiments on multiple public benchmarks demonstrate substantial gains over state-of-the-art methods in accuracy, interpretability, and cross-domain applicability—establishing a critical technical foundation for digital forensics and AI content governance.

Technology Category

Application Category

📝 Abstract
The proliferation of generative AI has led to hyper-realistic synthetic videos, escalating misuse risks and outstripping binary real/fake detectors. We introduce SAGA (Source Attribution of Generative AI videos), the first comprehensive framework to address the urgent need for AI-generated video source attribution at a large scale. Unlike traditional detection, SAGA identifies the specific generative model used. It uniquely provides multi-granular attribution across five levels: authenticity, generation task (e.g., T2V/I2V), model version, development team, and the precise generator, offering far richer forensic insights. Our novel video transformer architecture, leveraging features from a robust vision foundation model, effectively captures spatio-temporal artifacts. Critically, we introduce a data-efficient pretrain-and-attribute strategy, enabling SAGA to achieve state-of-the-art attribution using only 0.5% of source-labeled data per class, matching fully supervised performance. Furthermore, we propose Temporal Attention Signatures (T-Sigs), a novel interpretability method that visualizes learned temporal differences, offering the first explanation for why different video generators are distinguishable. Extensive experiments on public datasets, including cross-domain scenarios, demonstrate that SAGA sets a new benchmark for synthetic video provenance, providing crucial, interpretable insights for forensic and regulatory applications.
Problem

Research questions and friction points this paper is trying to address.

Identifying the specific generative model used to create synthetic videos
Providing multi-granular attribution across five forensic levels
Achieving accurate source attribution with minimal labeled training data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granular attribution across five forensic levels
Data-efficient pretrain-and-attribute strategy with minimal supervision
Temporal Attention Signatures for interpretable generator differentiation
🔎 Similar Papers
No similar papers found.