Post-hoc Selective Classification for Reliable Synthetic Image Detection

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing deepfake image detectors often exhibit insufficient reliability under distribution shifts, leading to frequent misclassifications. To address this issue, this work proposes a training-free, post-hoc selective classification framework that enhances confidence estimation by extending the logit concept to intermediate network layers. The approach aggregates multi-layer features and aligns intermediate-layer representations with class-specific centroids to construct a more robust uncertainty measure. Furthermore, it introduces a preference optimization algorithm based on an upper bound of the Area Under the Risk-Coverage curve (AURC) to enable efficient rejection of unreliable predictions. Evaluated under common covariate shifts, the method substantially improves selective classification performance, achieving up to a 69.55% reduction in AURC.

📝 Abstract

As synthetic images become increasingly realistic, reliable synthetic image detection techniques are of pressing need to prevent their misuse. Despite satisfactory in-distribution performance, deep neural network-based synthetic image detectors (SIDs) lack reliability in deployment and often fail in the presence of common covariate shifts, resulting in poor detection accuracy. To avoid the risk caused by potential errors, we adopt a selective classification (SC) strategy by allowing SIDs to abstain from making low confidence predictions. For practicality, we focus on post-hoc methods which perform confidence estimation on a given SID without retraining. However, we show that conventional logit-based confidence score functions (CSFs) exhibit pathological behavior under covariate shifts, leading to SC performance close to or even worse than random guessing. To address this, we propose a simple yet effective SC framework for Reliable Synthetic Image Detection (ReSIDe). First, we generalize the notion of logits to an SID's intermediate layers from a centroid matching perspective, extending the use of logit-based CSFs to any layer of an SID. Then, we introduce a preference optimization algorithm that aggregates confidence scores extracted from different layers to a final confidence estimate by minimizing an upper bound of the area under the risk-coverage curve (AURC). Extensive experimental results show that ReSIDe significantly boosts the SC performance of various logit-based CSFs under common covariate shifts, achieving up to 69.55% AURC reduction.

Problem

Research questions and friction points this paper is trying to address.

synthetic image detection

covariate shift

selective classification

reliability

post-hoc

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Classification

Synthetic Image Detection

Covariate Shift