LEGION: Learning to Ground and Explain for Synthetic Image Detection

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing synthetic image detection methods suffer from poor interpretability and limited generalization, while mainstream benchmarks rely on outdated generative models and lack fine-grained annotations. To address these limitations, we introduce SynthScars—the first high-quality, fine-grained dataset specifically designed for image forgery detection, featuring pixel-level segmentation masks of forged regions, coarse- and fine-grained forgery type labels, and natural language explanations. Building upon SynthScars, we propose LEGION, a multimodal large language-vision model framework that pioneers the integration of multimodal large models (MLLMs) into explainable forgery detection. LEGION jointly performs visual–linguistic reasoning and generative feedback control to unify artifact-level classification, pixel-level segmentation, and textual explanation. On SynthScars, LEGION achieves 68.42% mIoU and 79.16% F1-score, significantly surpassing state-of-the-art methods. Moreover, its generative guidance mechanism enhances human perceptual consistency in output imagery.

Technology Category

Application Category

📝 Abstract
The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated generators and a lack of fine-grained annotations. In this paper, we introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Furthermore, we propose LEGION (LEarning to Ground and explain for Synthetic Image detectiON), a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation. Building upon this capability, we further explore LEGION as a controller, integrating it into image refinement pipelines to guide the generation of higher-quality and more realistic images. Extensive experiments show that LEGION outperforms existing methods across multiple benchmarks, particularly surpassing the second-best traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score. Moreover, the refined images generated under its guidance exhibit stronger alignment with human preferences. The code, model, and dataset will be released.
Problem

Research questions and friction points this paper is trying to address.

Lack of artifact-level interpretability in synthetic image detection
Outdated generators and insufficient annotations in current datasets
Need for improved image refinement guided by detection and explanation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces SynthScars dataset with expert annotations
Proposes LEGION framework for image forgery analysis
Integrates LEGION into image refinement pipelines
🔎 Similar Papers
No similar papers found.
H
Hengrui Kang
Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory
S
Siwei Wen
Beihang University, Shanghai Artificial Intelligence Laboratory
Zichen Wen
Zichen Wen
Shanghai Jiao Tong University
Efficient AITrustworthy AILarge Language ModelMachine Learning
Junyan Ye
Junyan Ye
SYSU
Computer Vision and Deep Learning
W
Weijia Li
Sun Yat-Sen University, Shanghai Artificial Intelligence Laboratory
P
Peilin Feng
Beihang University, Shanghai Artificial Intelligence Laboratory
Baichuan Zhou
Baichuan Zhou
University of Waterloo
Computer VisionNatural Language Processing
B
Bin Wang
Shanghai Artificial Intelligence Laboratory
Dahua Lin
Dahua Lin
The Chinese University of Hong Kong
computer visionmachine learningprobabilistic inferencebayesian nonparametrics
Linfeng Zhang
Linfeng Zhang
DP Technology; AI for Science Institute
AI for Sciencemulti-scale modelingmolecular simulationdrug/materials design
Conghui He
Conghui He
Shanghai AI Laboratory
Data-centric AILLMDocument Intelligence