🤖 AI Summary
This work addresses the lack of interpretability in current AI-generated image (AIGI) detection methods and the absence of fine-grained, localized annotations for perceptual artifacts in mainstream benchmarks. To this end, we introduce X-AIGD—the first fine-grained benchmark enabling interpretable AIGI detection—featuring pixel-level, categorized multi-level artifact annotations that span low-level distortions, high-level semantic inconsistencies, and counterfactual content at the cognitive level. Leveraging this benchmark, we construct a diverse test suite integrating human annotations, attention alignment mechanisms, and an interpretability evaluation framework. Our experiments reveal that existing detectors rarely rely on genuine artifacts for decisions; however, explicitly guiding models to attend to artifact regions substantially enhances both their interpretability and generalization capability.
📝 Abstract
Current AI-Generated Image (AIGI) detection approaches predominantly rely on binary classification to distinguish real from synthetic images, often lacking interpretable or convincing evidence to substantiate their decisions. This limitation stems from existing AIGI detection benchmarks, which, despite featuring a broad collection of synthetic images, remain restricted in their coverage of artifact diversity and lack detailed, localized annotations. To bridge this gap, we introduce a fine-grained benchmark towards eXplainable AI-Generated image Detection, named X-AIGD, which provides pixel-level, categorized annotations of perceptual artifacts, spanning low-level distortions, high-level semantics, and cognitive-level counterfactuals. These comprehensive annotations facilitate fine-grained interpretability evaluation and deeper insight into model decision-making processes. Our extensive investigation using X-AIGD provides several key insights: (1) Existing AIGI detectors demonstrate negligible reliance on perceptual artifacts, even at the most basic distortion level. (2) While AIGI detectors can be trained to identify specific artifacts, they still substantially base their judgment on uninterpretable features. (3) Explicitly aligning model attention with artifact regions can increase the interpretability and generalization of detectors. The data and code are available at: https://github.com/Coxy7/X-AIGD.