🤖 AI Summary
To address the poor generalization of deepfake detection across unseen generative models, this paper proposes the first training-free few-shot detection framework. It constructs a support set comprising both authentic and fake exemplars using only a small number of misclassified samples—easily obtainable in real-world deployment—without requiring labeled training data. Detection is performed via nearest-neighbor classification, leveraging an image-level realism metric to measure similarity between test samples and support-set instances. By reformulating deepfake detection as a few-shot learning problem, the method eliminates reliance on large-scale supervised training. Evaluated across 29 diverse generative models in a cross-model setting, it achieves a new state-of-the-art average detection accuracy, outperforming the previous best method by 8.7%.
📝 Abstract
Recent deepfake detection studies often treat unseen sample detection as a ``zero-shot" task, training on images generated by known models but generalizing to unknown ones. A key real-world challenge arises when a model performs poorly on unknown samples, yet these samples remain available for analysis. This highlights that it should be approached as a ``few-shot" task, where effectively utilizing a small number of samples can lead to significant improvement. Unlike typical few-shot tasks focused on semantic understanding, deepfake detection prioritizes image realism, which closely mirrors real-world distributions. In this work, we propose the Few-shot Training-free Network (FTNet) for real-world few-shot deepfake detection. Simple yet effective, FTNet differs from traditional methods that rely on large-scale known data for training. Instead, FTNet uses only one fake samplefrom an evaluation set, mimicking the scenario where new samples emerge in the real world and can be gathered for use, without any training or parameter updates. During evaluation, each test sample is compared to the known fake and real samples, and it is classified based on the category of the nearest sample. We conduct a comprehensive analysis of AI-generated images from 29 different generative models and achieve a new SoTA performance, with an average improvement of 8.7% compared to existing methods. This work introduces a fresh perspective on real-world deepfake detection: when the model struggles to generalize on a few-shot sample, leveraging the failed samples leads to better performance.