🤖 AI Summary
To address three key limitations in generative image forgery detection—weak artifact features, model overfitting, and insufficient local awareness—this paper proposes SAFE, a lightweight and efficient detection framework. Methodologically, SAFE redefines the training paradigm of Source Identification (SID) from an image transformation perspective: it replaces conventional downsampling with cropping for preprocessing, introduces enhanced data augmentations (ColorJitter and Rotation), and designs a patch-level random masking strategy tailored for SID; additionally, it employs a lightweight CNN architecture. Evaluated on an open-world benchmark covering 26 generative models, SAFE achieves state-of-the-art performance: +4.5% accuracy and +2.9% mean average precision over prior methods, demonstrating significantly improved robustness in detecting synthetic images.
📝 Abstract
With recent generative models facilitating photo-realistic image synthesis, the proliferation of synthetic images has also engendered certain negative impacts on social platforms, thereby raising an urgent imperative to develop effective detectors. Current synthetic image detection (SID) pipelines are primarily dedicated to crafting universal artifact features, accompanied by an oversight about SID training paradigm. In this paper, we re-examine the SID problem and identify two prevalent biases in current training paradigms, i.e., weakened artifact features and overfitted artifact features. Meanwhile, we discover that the imaging mechanism of synthetic images contributes to heightened local correlations among pixels, suggesting that detectors should be equipped with local awareness. In this light, we propose SAFE, a lightweight and effective detector with three simple image transformations. Firstly, for weakened artifact features, we substitute the down-sampling operator with the crop operator in image pre-processing to help circumvent artifact distortion. Secondly, for overfitted artifact features, we include ColorJitter and RandomRotation as additional data augmentations, to help alleviate irrelevant biases from color discrepancies and semantic differences in limited training samples. Thirdly, for local awareness, we propose a patch-based random masking strategy tailored for SID, forcing the detector to focus on local regions at training. Comparative experiments are conducted on an open-world dataset, comprising synthetic images generated by 26 distinct generative models. Our pipeline achieves a new state-of-the-art performance, with remarkable improvements of 4.5% in accuracy and 2.9% in average precision against existing methods. Our code is available at: https://github.com/Ouxiang-Li/SAFE.