🤖 AI Summary
This work addresses the challenging problem of near-duplicate detection and provenance tracing for facial images, introducing a novel task: identifying the original image and inferring the temporal generation order among near-duplicates. Methodologically, we propose the first Image Pedigree Tree (IPT) and Image Pedigree Forest (IPF) modeling framework—pioneering the use of graph theory in facial near-duplicate provenance analysis—and integrate graph neural networks, invariant feature extraction, ensemble-based IPF construction, and cross-modal robust evaluation. Our approach demonstrates strong robustness under unknown generative models, complex geometric and photometric transformations, and diverse biometric perturbations. Evaluated on benchmark datasets, our IPF reconstruction accuracy improves by 42% over state-of-the-art methods. The framework significantly enhances both interpretability and practical applicability of facial image provenance analysis.
📝 Abstract
Near-duplicate images are often generated when applying repeated photometric and geometric transformations that produce imperceptible variants of the original image. Consequently, a deluge of near-duplicates can be circulated online posing copyright infringement concerns. The concerns are more severe when biometric data is altered through such nuanced transformations. In this work, we address the challenge of near-duplicate detection in face images by, firstly, identifying the original image from a set of near-duplicates and, secondly, deducing the relationship between the original image and the near-duplicates. We construct a tree-like structure, called an Image Phylogeny Tree (IPT) using a graph-theoretic approach to estimate the relationship, i.e., determine the sequence in which they have been generated. We further extend our method to create an ensemble of IPTs known as Image Phylogeny Forests (IPFs). We rigorously evaluate our method to demonstrate robustness across other modalities, unseen transformations by latest generative models and IPT configurations, thereby significantly advancing the state-of-the-art performance by 42% on IPF reconstruction accuracy.