🤖 AI Summary
Open-set synthetic image attribution—identifying unknown generative model architectures—remains challenging due to the absence of prior knowledge about unseen models and the limitations of closed-set assumptions.
Method: This paper proposes BOSC, a backdoor-injection-based classification framework. During training, class-specific triggers are embedded into samples to enforce feature-trigger alignment; at inference, unknown-architecture samples are automatically rejected via trigger-response scoring.
Contribution/Results: BOSC is the first to introduce backdoor learning into open-set image attribution, enabling interpretable, distribution-agnostic rejection without closed-set discrimination assumptions. It integrates trigger-driven feature alignment, response-confidence scoring, adversarial robust training, and image-processing invariance enhancement. Extensive experiments on multiple benchmarks demonstrate state-of-the-art performance: simultaneous improvement in rejection accuracy and known-class identification accuracy, with strong robustness against common distortions—including JPEG compression, denoising, and scaling.
📝 Abstract
With the continuous progress of AI technology, new generative architectures continuously appear, thus driving the attention of researchers towards the development of synthetic image attribution methods capable of working in open-set scenarios. Existing approaches focus on extracting highly discriminative features for closed-set architectures, increasing the confidence of the prediction when the samples come from closed-set models/architectures, or estimating the distribution of unknown samples, i.e., samples from unknown architectures. In this paper, we propose a novel framework for open set attribution of synthetic images, named BOSC (Backdoor-based Open Set Classification), that relies on backdoor injection to design a classifier with rejection option. BOSC works by deliberately including class-specific triggers inside a portion of the images in the training set to induce the network to establish a matching between in-set class features and trigger features. The behavior of the trained model with respect to samples containing a trigger is then exploited at inference time to perform sample rejection using an ad-hoc score. Experiments show that the proposed method has good performance, always surpassing the state-of-the-art. Robustness against image processing is also very good. Although we designed our method for the task of synthetic image attribution, the proposed framework is a general one and can be used for other image forensic applications.