SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the limitation of existing deepfake datasets, which predominantly support only single-step editing and thus hinder provenance analysis of multi-step manipulations. To bridge this gap, the authors introduce SEED, the first large-scale benchmark for sequential editing, comprising over 90,000 facial images edited through 1–4 steps using diffusion models, accompanied by fine-grained annotations including editing order, textual instructions, operation masks, and source generative models. Building upon this benchmark, they propose FAITH, a baseline model that integrates spatial and wavelet frequency-domain features, revealing for the first time the critical role of high-frequency signals in detecting multi-step editing traces. Experiments demonstrate that methods relying solely on spatial information are inherently limited, whereas FAITH leverages frequency-domain cues to significantly improve attribution accuracy and maintains robustness under image degradation.

Technology Category

Application Category

📝 Abstract

Deepfake content on social networks is increasingly produced through multiple \emph{sequential} edits to biometric data such as facial imagery. Consequently, the final appearance of an image often reflects a latent chain of operations rather than a single manipulation. Recovering these editing histories is essential for visual provenance analysis, misinformation auditing, and forensic or platform moderation workflows that must trace the origin and evolution of AI-generated media. However, existing datasets predominantly focus on single-step editing and overlook the cumulative artifacts introduced by realistic multi-step pipelines. To address this gap, we introduce Sequential Editing in Diffusion (\textbf{SEED}), a large-scale benchmark for sequential provenance tracing in facial imagery. SEED contains over 90K images constructed via one to four sequential attribute edits using diffusion-based editing pipelines, with fine-grained annotations including edit order, textual instructions, manipulation masks, and generation models. These metadata enable step-wise evidence analysis and support forgery detection, sequence prediction. To benchmark the challenges posed by SEED, we evaluate representative analysis strategies and observe that spatial-only approaches struggle under subtle and distributed diffusion artifacts, especially when such artifacts accumulate across multiple edits. Motivated by this observation, we further establish \textbf{FAITH}, a frequency-aware Transformer baseline that aggregates spatial and frequency-domain cues to identify and order latent editing events. Results show that high-frequency signals, particularly wavelet components, provide effective cues even under image degradation. Overall, SEED facilitates systematic study of sequential provenance tracing and evidence aggregation for trustworthy analysis of AI-generated visual content.

Problem

Research questions and friction points this paper is trying to address.

provenance tracing

sequential deepfake

facial editing

diffusion artifacts

visual forensics

Innovation

Methods, ideas, or system contributions that make the work stand out.

sequential deepfake editing

provenance tracing

diffusion-based editing