WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Synthetic image provenance attribution faces significant challenges due to the rapid proliferation of generative models, heterogeneous generation techniques, and the scarcity of high-quality, open-source benchmark datasets. To address this, we introduce WILD—the first real-world-oriented synthetic image溯源 benchmark—comprising 20 mainstream generators (10 closed-set, 10 open-set), each with 1,000 images, 50% of which undergo diverse post-processing (e.g., compression, filtering, geometric transformations). We propose a novel “closed-set training + open-set in-the-wild” dual-mode paradigm to jointly model generator evolution and post-processing perturbations. Additionally, we design a multi-granularity augmentation strategy and a cross-generator distribution alignment evaluation protocol. Within a unified evaluation framework, we systematically assess seven baseline methods across closed-set/open-set identification, verification, and robustness tasks. Our benchmark significantly enhances model generalizability and deployability under complex, realistic conditions.

Technology Category

Application Category

📝 Abstract
Synthetic image source attribution is an open challenge, with an increasing number of image generators being released yearly. The complexity and the sheer number of available generative techniques, as well as the scarcity of high-quality open source datasets of diverse nature for this task, make training and benchmarking synthetic image source attribution models very challenging. WILD is a new in-the-Wild Image Linkage Dataset designed to provide a powerful training and benchmarking tool for synthetic image attribution models. The dataset is built out of a closed set of 10 popular commercial generators, which constitutes the training base of attribution models, and an open set of 10 additional generators, simulating a real-world in-the-wild scenario. Each generator is represented by 1,000 images, for a total of 10,000 images in the closed set and 10,000 images in the open set. Half of the images are post-processed with a wide range of operators. WILD allows benchmarking attribution models in a wide range of tasks, including closed and open set identification and verification, and robust attribution with respect to post-processing and adversarial attacks. Models trained on WILD are expected to benefit from the challenging scenario represented by the dataset itself. Moreover, an assessment of seven baseline methodologies on closed and open set attribution is presented, including robustness tests with respect to post-processing.
Problem

Research questions and friction points this paper is trying to address.

Addressing synthetic image source attribution challenge
Providing diverse dataset for model training and benchmarking
Evaluating robustness against post-processing and adversarial attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

WILD dataset for synthetic image attribution
Closed and open set generators included
Robust against post-processing and attacks
🔎 Similar Papers
No similar papers found.
Pietro Bongini
Pietro Bongini
Post Doctoral Student/Researcher, University of Siena
Deep Learning on Structured DataGraph Neural NetworksGraph GenerationDrug Side Effect
Sara Mandelli
Sara Mandelli
Politecnico di Milano
Multimedia Signal ProcessingMultimedia ForensicsGeophysical Image Processing
Andrea Montibeller
Andrea Montibeller
PhD, University of Trento
Multimedia ForensicsDeep Learning
Mirko Casu
Mirko Casu
PhD Student @ University of Catania
psychologyAImhealth
O
Orazio Pontorno
University of Catania, Department of Mathematics and Computer Science, Italy
C
Claudio Ragaglia
University of Catania, Department of Mathematics and Computer Science, Italy
L
Luca Zanchetta
Sapienza University of Rome - Departement of Computer, Control and Management Engineering, Italy
M
Mattia Aquilina
Sapienza University of Rome - Departement of Computer, Control and Management Engineering, Italy
T
Taiba Majid Wani
Sapienza University of Rome - Departement of Computer, Control and Management Engineering, Italy
Luca Guarnera
Luca Guarnera
University of Catania - IPLab (Image Processing Lab)
Multimedia ForensicsComputer VisionMachine learningPattern recognition
Benedetta Tondi
Benedetta Tondi
Department of Information Engineering of the University of Siena
Image ForensicsSignal processingMultimedia securityInformation theoryMachine learning
Paolo Bestagini
Paolo Bestagini
Politecnico di Milano
Multimedia Signal ProcessingMultimedia Forensics
Irene Amerini
Irene Amerini
Sapienza Università di Roma, Italy
Multimedia forensics and security
F
Francesco Denatale
University of Trento, Department of Information Engineering and Computer Science, Italy
Sebastiano Battiato
Sebastiano Battiato
Full Professor, University of Catania - IPLab (Image Processing lab) - ICTLab
Computer VisionInformation Forensics and SecurityMultimedia ForensicsMedical Imaging
Mauro Barni
Mauro Barni
Professor, University of Siena
Signal Processinginformation forensics and securityimage processingdigital watermarkingmultimedia security