FeatDistill: A Feature Distillation Enhanced Multi-Expert Ensemble Framework for Robust AI-generated Image Detection

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in detecting AI-generated images under real-world conditions: degradation interference, insufficient feature representation, and limited generalization. To tackle these issues, the authors propose a framework that integrates a multi-expert ensemble with feature-level self-distillation. The approach leverages four Vision Transformer backbones derived from CLIP and SigLIP variants. Training proceeds in two stages: initial standard binary classification followed by dense feature-level self-distillation to align representations across experts. The method further incorporates comprehensive degradation modeling and an expanded training dataset. Experimental results demonstrate that the model achieves exceptional robustness and cross-generator generalization under the NTIRE challenge protocol, while maintaining a peak GPU memory footprint of only approximately 10 GB, making it well-suited for practical deepfake detection applications.

Technology Category

Application Category

📝 Abstract
The rapid iteration and widespread dissemination of deepfake technology have posed severe challenges to information security, making robust and generalizable detection of AI-generated forged images increasingly important. In this paper, we propose FeatDistill, an AI-generated image detection framework that integrates feature distillation with a multi-expert ensemble, developed for the NTIRE Challenge on Robust AI-Generated Image Detection in the Wild. The framework explicitly targets three practical bottlenecks in real-world forensics: degradation interference, insufficient feature representation, and limited generalization. Concretely, we build a four-backbone Vision Transformer (ViT) ensemble composed of CLIP and SigLIP variants to capture complementary forensic cues. To improve data coverage, we expand the training set and introduce comprehensive degradation modeling, which exposes the detector to diverse quality variations and synthesis artifacts commonly encountered in unconstrained scenarios. We further adopt a two-stage training paradigm: the model is first optimized with a standard binary classification objective, then refined by dense feature-level self-distillation for representation alignment. This design effectively mitigates overfitting and enhances semantic consistency of learned features. At inference time, the final prediction is obtained by averaging the probabilities from four independently trained experts, yielding stable and reliable decisions across unseen generators and complex degradations. Despite the ensemble design, the framework remains efficient, requiring only about 10 GB peak GPU memory. Extensive evaluations in the NTIRE challenge setting demonstrate that FeatDistill achieves strong robustness and generalization under diverse ``in-the-wild'' conditions, offering an effective and practical solution for real-world deepfake image detection.
Problem

Research questions and friction points this paper is trying to address.

AI-generated image detection
degradation interference
feature representation
generalization
deepfake
Innovation

Methods, ideas, or system contributions that make the work stand out.

feature distillation
multi-expert ensemble
Vision Transformer
degradation modeling
self-distillation
🔎 Similar Papers
No similar papers found.
Z
Zhilin Tu
School of Computer Science and Engineering, University of Electronic Science and Technology of China
K
Kemou Li
State Key Laboratory of Internet of Things for Smart City, University of Macau
F
Fengpeng Li
PRADA Lab, King Abdullah University of Science and Technology; Department of Information Engineering, University of Florence
Jianwei Fei
Jianwei Fei
University of Firenze
Generative Model SecurityMultimedia Forensics
J
Jiamin Zhang
State Key Laboratory of Internet of Things for Smart City, University of Macau
Haiwei Wu
Haiwei Wu
University of Electronic Science and Technology of China
Multimedia Forensics and Security