🤖 AI Summary
This work addresses the significant performance degradation of existing AI-generated image (AIGI) detection methods under complex distortions encountered in real-world scenarios. To enhance robustness, the authors propose a LoRA-based paired training strategy that fine-tunes vision foundation models to explicitly decouple and jointly optimize generalization capability and robustness. The approach incorporates distortion and resolution simulation during training to better approximate the distribution of in-the-wild data. This method substantially improves detector robustness under severe distortions and achieved third place in the NTIRE 2024 “Robust AI-Generated Image Detection in the Wild” challenge.
📝 Abstract
The proliferation of highly realistic AI-Generated Image (AIGI) has necessitated the development of practical detection methods. While current AIGI detectors perform admirably on clean datasets, their detection performance frequently decreases when deployed "in the wild", where images are subjected to unpredictable, complex distortions. To resolve the critical vulnerability, we propose a novel LoRA-based Pairwise Training (LPT) strategy designed specifically to achieve robust detection for AIGI under severe distortions. The core of our strategy involves the targeted finetuning of a visual foundation model, the deliberate simulation of data distribution during the training phase, and a unique pairwise training process. Specifically, we introduce distortion and size simulations to better fit the distribution from the validation and test sets. Based on the strong visual representation capability of the visual foundation model, we finetune the model to achieve AIGI detection. The pairwise training is utilized to improve the detection via decoupling the generalization and robustness optimization. Experiments show that our approach secured the 3th placement in the NTIRE Robust AI-Generated Image Detection in the Wild challenge