Detecting Origin Attribution for Text-to-Image Diffusion Models

📅 2024-03-28

🏛️ IEEE Workshop/Winter Conference on Applications of Computer Vision

📈 Citations: 1

✨ Influential: 1

career value

188K/year

🤖 AI Summary

This work addresses the provenance attribution of images generated by text-to-image (T2I) diffusion models. We propose a novel attribution method grounded in mid-level visual representations—specifically style and structural features—and conduct a systematic evaluation of source identifiability across 12 state-of-the-art T2I models. Contrary to common assumptions, we find that high-frequency details are not decisive for attribution; instead, mid-level style representations substantially outperform raw RGB inputs. Moreover, subtle generation differences—such as initialization seeds—are highly detectable. Our methodology integrates high-frequency perturbation analysis, ablation studies, and cross-model generalization evaluation. On a multi-model benchmark, the approach achieves high attribution accuracy, demonstrating that multi-granularity visual cues collectively enable reliable and interpretable provenance tracing. This provides a robust, explainable technical foundation for AIGC content governance.

Technology Category

Application Category

📝 Abstract

Modern text-to-image (T2I) diffusion models can generate images with remarkable realism and creativity. These advancements have sparked research in fake image detection and attribution, yet prior studies have not fully explored the practical and scientific dimensions of this task. In addition to attributing images to 12 state-of-the-art T2I generators, we provide extensive analyses on what inference stage hyperparameters and image modifications are discernible. Our experiments reveal that initialization seeds are highly detectable, along with other subtle variations in the image generation process to some extent. We further investigate what visual traces are leveraged in image attribution by perturbing high-frequency details and employing midlevel representations of image style and structure. Notably, altering high-frequency information causes only slight reductions in accuracy, and training an attributor on style representations outperforms training on RGB images. Our analyses underscore that fake images are detectable and attributable at various levels of visual granularity.

Problem

Research questions and friction points this paper is trying to address.

Detect origin of images from text-to-image diffusion models

Analyze discernible hyperparameters and image modifications

Investigate visual traces for image attribution accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects T2I generators via initialization seeds

Uses high-frequency and mid-level representations

Trains attributor on style representations

🔎 Similar Papers

Regeneration Based Training-free Attribution of Fake Images Generated by Text-to-Image Generative Models