Automated In-the-Wild Data Collection for Continual AI Generated Image Detection

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the significant performance degradation of existing AI-generated image detectors under distribution shifts and emerging generative models. To tackle this challenge, the authors propose a data-centric continual adaptation framework that innovatively combines in-the-wild real data with generator-driven synthetic data. The approach employs an automated weakly supervised pipeline to construct training sets and integrates a continual learning mechanism, enabling effective adaptation to new generative models with only limited synthetic data while mitigating catastrophic forgetting. Experimental results demonstrate substantial improvements, with average accuracy gains of 9.14% and 8.0% on two state-of-the-art detectors, significantly enhancing generalization and robustness against unseen generative models.

📝 Abstract

The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this work, we propose a data-centric continual adaptation framework for updating detectors in evolving environments. We show that both in-the-wild data and generator-driven data are essential for adapting detectors. We introduce an automated, weakly supervised pipeline for constructing in-the-wild datasets through fact-check article retrieval. Additionally, we demonstrate that incorporating even a small amount of generator-driven data during training enables effective adaptation to newly emerging models, while combining it with in-the-wild data within a continual learning framework enables robust adaptation and mitigates catastrophic forgetting. Extensive experiments on two state-of-the-art detectors show significant improvements of +9.14% and +8% in average accuracy, respectively.

Problem

Research questions and friction points this paper is trying to address.

AI-generated image detection

distribution shift

continual learning

generative models

in-the-wild data

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual learning

in-the-wild data

generator-driven data