GAMMA: Generalizable Alignment via Multi-task and Manipulation-Augmented Training for AI-Generated Image Detection

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI-generated image detectors perform well on in-distribution data but exhibit poor generalization to unseen generative models, primarily due to over-reliance on model-specific artifacts (e.g., style priors, compression traces). To address this cross-domain generalization challenge, we propose a multi-task detection framework incorporating operation consistency constraints and a reverse cross-attention mechanism—enabling the segmentation head to guide the classification branch in correcting domain bias. Our method further integrates inpainting-based operation augmentation, semantics-preserving perturbations, and joint supervision from dual segmentation heads and a classification head. Evaluated on the GenImage benchmark, our approach achieves state-of-the-art generalization performance, improving accuracy by 5.8% over prior methods, and demonstrates significant robustness against emerging generative models such as GPT-4o.

Technology Category

Application Category

📝 Abstract
With generative models becoming increasingly sophisticated and diverse, detecting AI-generated images has become increasingly challenging. While existing AI-genereted Image detectors achieve promising performance on in-distribution generated images, their generalization to unseen generative models remains limited. This limitation is largely attributed to their reliance on generation-specific artifacts, such as stylistic priors and compression patterns. To address these limitations, we propose GAMMA, a novel training framework designed to reduce domain bias and enhance semantic alignment. GAMMA introduces diverse manipulation strategies, such as inpainting-based manipulation and semantics-preserving perturbations, to ensure consistency between manipulated and authentic content. We employ multi-task supervision with dual segmentation heads and a classification head, enabling pixel-level source attribution across diverse generative domains. In addition, a reverse cross-attention mechanism is introduced to allow the segmentation heads to guide and correct biased representations in the classification branch. Our method achieves state-of-the-art generalization performance on the GenImage benchmark, imporving accuracy by 5.8%, but also maintains strong robustness on newly released generative model such as GPT-4o.
Problem

Research questions and friction points this paper is trying to address.

Detecting AI-generated images across unseen generative models
Reducing reliance on generation-specific artifacts and biases
Enhancing generalization and robustness in image detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task supervision with dual segmentation
Diverse manipulation strategies for consistency
Reverse cross-attention mechanism correcting bias
🔎 Similar Papers
No similar papers found.
H
Haozhen Yan
Shanghai Jiao Tong University
Y
Yan Hong
Ant Group
S
Suning Lang
Shanghai Jiao Tong University
J
Jiahui Zhan
Shanghai Jiao Tong University
Y
Yikun Ji
Shanghai Jiao Tong University
Y
Yujie Gao
Shanghai Jiao Tong University
Jun Lan
Jun Lan
Ant Group
H
Huijia Zhu
Ant Group
W
Weiqiang Wang
Ant Group
Jianfu Zhang
Jianfu Zhang
Shanghai Jiao Tong University
Machine LearningComputer Vision