🤖 AI Summary
While state-of-the-art whole-image deepfake detectors excel at identifying fully synthetic images, their generalization to fine-grained manipulations—such as localized inpainting—remains systematically unassessed.
Method: This work presents the first systematic zero-shot transfer evaluation of mainstream whole-image detectors under local editing scenarios. We introduce a comprehensive benchmark dataset covering multiple generative models, diverse mask sizes, and various inpainting methods.
Results: Experiments reveal that whole-image detectors exhibit robust detection performance on medium-to-large masked regions and regeneration-based inpainting—outperforming most heuristic local detection approaches. Crucially, we identify consistent transfer patterns: detector performance degrades gracefully with decreasing mask size but remains surprisingly effective even for small edits when trained on large-scale generative models. Our findings demonstrate that large-scale generative pretraining inherently encodes fine-grained manipulation cues, enabling effective cross-granularity forgery detection. This establishes a novel paradigm for leveraging whole-image models in fine-grained forensic analysis without task-specific fine-tuning.
📝 Abstract
The rapid progress of generative AI has enabled highly realistic image manipulations, including inpainting and region-level editing. These approaches preserve most of the original visual context and are increasingly exploited in cybersecurity-relevant threat scenarios. While numerous detectors have been proposed for identifying fully synthetic images, their ability to generalize to localized manipulations remains insufficiently characterized. This work presents a systematic evaluation of state-of-the-art detectors, originally trained for the deepfake detection on fully synthetic images, when applied to a distinct challenge: localized inpainting detection. The study leverages multiple datasets spanning diverse generators, mask sizes, and inpainting techniques. Our experiments show that models trained on a large set of generators exhibit partial transferability to inpainting-based edits and can reliably detect medium- and large-area manipulations or regeneration-style inpainting, outperforming many existing ad hoc detection approaches.