🤖 AI Summary
Existing synthetic image detection methods suffer from poor interpretability and limited generalization, while mainstream benchmarks rely on outdated generative models and lack fine-grained annotations. To address these limitations, we introduce SynthScars—the first high-quality, fine-grained dataset specifically designed for image forgery detection, featuring pixel-level segmentation masks of forged regions, coarse- and fine-grained forgery type labels, and natural language explanations. Building upon SynthScars, we propose LEGION, a multimodal large language-vision model framework that pioneers the integration of multimodal large models (MLLMs) into explainable forgery detection. LEGION jointly performs visual–linguistic reasoning and generative feedback control to unify artifact-level classification, pixel-level segmentation, and textual explanation. On SynthScars, LEGION achieves 68.42% mIoU and 79.16% F1-score, significantly surpassing state-of-the-art methods. Moreover, its generative guidance mechanism enhances human perceptual consistency in output imagery.
📝 Abstract
The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated generators and a lack of fine-grained annotations. In this paper, we introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Furthermore, we propose LEGION (LEarning to Ground and explain for Synthetic Image detectiON), a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation. Building upon this capability, we further explore LEGION as a controller, integrating it into image refinement pipelines to guide the generation of higher-quality and more realistic images. Extensive experiments show that LEGION outperforms existing methods across multiple benchmarks, particularly surpassing the second-best traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score. Moreover, the refined images generated under its guidance exhibit stronger alignment with human preferences. The code, model, and dataset will be released.