🤖 AI Summary
Existing research is hindered by the absence of large-scale, fine-grained temporal facial attribute editing benchmarks, impeding progress in editing tracking, provenance analysis, and robustness evaluation. To address this, we introduce SEED—the first large-scale benchmark tailored for diffusion-based progressive facial editing—comprising over 90,000 edited images generated via LEdits, SDXL, and SD3 across 1–4 editing steps, accompanied by synchronized editing sequences, high-resolution attribute masks, and prompt annotations. We formally define and release the first fine-grained temporal editing benchmark. Furthermore, we propose FAITH, a novel frequency-aware Transformer architecture that explicitly incorporates high-frequency features to enhance sensitivity to subtle, consecutive edits. Extensive experiments demonstrate that FAITH significantly outperforms state-of-the-art methods on editing sequence detection, empirically validating the critical role of high-frequency cues in identifying progressive manipulations.
📝 Abstract
Diffusion models have recently enabled precise and photorealistic facial editing across a wide range of semantic attributes. Beyond single-step modifications, a growing class of applications now demands the ability to analyze and track sequences of progressive edits, such as stepwise changes to hair, makeup, or accessories. However, sequential editing introduces significant challenges in edit attribution and detection robustness, further complicated by the lack of large-scale, finely annotated benchmarks tailored explicitly for this task. We introduce SEED, a large-scale Sequentially Edited facE Dataset constructed via state-of-the-art diffusion models. SEED contains over 90,000 facial images with one to four sequential attribute modifications, generated using diverse diffusion-based editing pipelines (LEdits, SDXL, SD3). Each image is annotated with detailed edit sequences, attribute masks, and prompts, facilitating research on sequential edit tracking, visual provenance analysis, and manipulation robustness assessment. To benchmark this task, we propose FAITH, a frequency-aware transformer-based model that incorporates high-frequency cues to enhance sensitivity to subtle sequential changes. Comprehensive experiments, including systematic comparisons of multiple frequency-domain methods, demonstrate the effectiveness of FAITH and the unique challenges posed by SEED. SEED offers a challenging and flexible resource for studying progressive diffusion-based edits at scale. Dataset and code will be publicly released at: https://github.com/Zeus1037/SEED.