🤖 AI Summary
This work addresses key challenges in synthetic face detection: poor cross-generator generalization, limited robustness to image degradations (e.g., compression, downscaling) and local hybrid manipulations (e.g., inpainting), and vulnerability to adversarial attacks. To this end, we construct FF5—a large-scale, multi-generator synthetic face dataset—and propose a lightweight, localization-capable detection framework based on the YOLO architecture. Our key contribution is the first systematic detection paradigm tailored to locally hybrid forgeries, which reveals, for the first time, the severe generalization failure of existing detectors against emerging diffusion-based generators (e.g., Realistic Vision). We further enhance robustness via targeted data augmentation and multi-source training. Experiments show near-perfect accuracy (>99%) and precise localization on in-distribution (source-generator) samples; however, performance degrades notably under cross-generator evaluation and adversarial perturbations—highlighting remaining challenges in generalization and security.
📝 Abstract
An experimental study on detecting synthetic face images is presented. We collected a dataset, called FF5, of five fake face image generators, including recent diffusion models. We find that a simple model trained on a specific image generator can achieve near-perfect accuracy in separating synthetic and real images. The model handles common image distortions (reduced resolution, compression) by using data augmentation. Moreover, partial manipulations, where synthetic images are blended into real ones by inpainting, are identified and the area of the manipulation is localized by a simple model of YOLO architecture. However, the model turned out to be vulnerable to adversarial attacks and does not generalize to unseen generators. Failure to generalize to detect images produced by a newer generator also occurs for recent state-of-the-art methods, which we tested on Realistic Vision, a fine-tuned version of StabilityAI's Stable Diffusion image generator.