Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AIGC-generated images frequently exhibit semantic anomalies—such as implausible object configurations or violations of physical laws and commonsense knowledge—undermining content credibility. To address this, we introduce the novel task of *semantic anomaly detection and attribution*, proposing the first large-scale, fine-grained benchmark, AnomReason, and a multi-agent reasoning framework, AnomAgent, capable of precise anomaly localization, root-cause attribution, and severity assessment. Our method employs a modular multi-agent pipeline: GPT-4o generates structured quadruple annotations (object, relation, context, violation), validated via lightweight human verification; we further propose semantic-aware evaluation metrics—SemAP and SemF1—to quantify alignment with human judgment. Fine-tuned models achieve significant gains over strong baselines in detection accuracy. The framework has been successfully deployed in explainable deepfake detection and semantic fidelity evaluation of generative models, advancing reproducible and interpretable research on AIGC semantic authenticity.

Technology Category

Application Category

📝 Abstract
The rapid advancement of AI-generated content (AIGC) has enabled the synthesis of visually convincing images; however, many such outputs exhibit subtle extbf{semantic anomalies}, including unrealistic object configurations, violations of physical laws, or commonsense inconsistencies, which compromise the overall plausibility of the generated scenes. Detecting these semantic-level anomalies is essential for assessing the trustworthiness of AIGC media, especially in AIGC image analysis, explainable deepfake detection and semantic authenticity assessment. In this paper, we formalize extbf{semantic anomaly detection and reasoning} for AIGC images and introduce extbf{AnomReason}, a large-scale benchmark with structured annotations as quadruples emph{(Name, Phenomenon, Reasoning, Severity)}. Annotations are produced by a modular multi-agent pipeline ( extbf{AnomAgent}) with lightweight human-in-the-loop verification, enabling scale while preserving quality. At construction time, AnomAgent processed approximately 4.17,B GPT-4o tokens, providing scale evidence for the resulting structured annotations. We further show that models fine-tuned on AnomReason achieve consistent gains over strong vision-language baselines under our proposed semantic matching metric ( extit{SemAP} and extit{SemF1}). Applications to {explainable deepfake detection} and {semantic reasonableness assessment of image generators} demonstrate practical utility. In summary, AnomReason and AnomAgent serve as a foundation for measuring and improving the semantic plausibility of AI-generated images. We will release code, metrics, data, and task-aligned models to support reproducible research on semantic authenticity and interpretable AIGC forensics.
Problem

Research questions and friction points this paper is trying to address.

Detecting semantic anomalies in AI-generated images
Assessing trustworthiness of AIGC media authenticity
Improving semantic plausibility of generated scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale benchmark with structured quadruple annotations
Modular multi-agent pipeline for annotation generation
Fine-tuning models for semantic anomaly detection metrics
🔎 Similar Papers
C
Chuangchuang Tan
Beijing Jiaotong University
X
Xiang Ming
Microsoft Research Asia
Jinglu Wang
Jinglu Wang
Microsoft Research Asia
Computer VisionComputer Graphics
R
Renshuai Tao
Beijing Jiaotong University
B
Bin Li
Shenzhen University
Yunchao Wei
Yunchao Wei
Professor, Beijing Jiaotong University, UTS, UIUC, NUS
Computer VisionMachine Learning
Y
Yao Zhao
Beijing Jiaotong University
Y
Yan Lu
Microsoft Research Asia