LEGION: Learning to Ground and Explain for Synthetic Image Detection

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing synthetic image detection methods suffer from poor interpretability and limited generalization, while mainstream benchmarks rely on outdated generative models and lack fine-grained annotations. To address these limitations, we introduce SynthScars—the first high-quality, fine-grained dataset specifically designed for image forgery detection, featuring pixel-level segmentation masks of forged regions, coarse- and fine-grained forgery type labels, and natural language explanations. Building upon SynthScars, we propose LEGION, a multimodal large language-vision model framework that pioneers the integration of multimodal large models (MLLMs) into explainable forgery detection. LEGION jointly performs visual–linguistic reasoning and generative feedback control to unify artifact-level classification, pixel-level segmentation, and textual explanation. On SynthScars, LEGION achieves 68.42% mIoU and 79.16% F1-score, significantly surpassing state-of-the-art methods. Moreover, its generative guidance mechanism enhances human perceptual consistency in output imagery.

Technology Category

Application Category

📝 Abstract

The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated generators and a lack of fine-grained annotations. In this paper, we introduce SynthScars, a high-quality and diverse dataset consisting of 12,236 fully synthetic images with human-expert annotations. It features 4 distinct image content types, 3 categories of artifacts, and fine-grained annotations covering pixel-level segmentation, detailed textual explanations, and artifact category labels. Furthermore, we propose LEGION (LEarning to Ground and explain for Synthetic Image detectiON), a multimodal large language model (MLLM)-based image forgery analysis framework that integrates artifact detection, segmentation, and explanation. Building upon this capability, we further explore LEGION as a controller, integrating it into image refinement pipelines to guide the generation of higher-quality and more realistic images. Extensive experiments show that LEGION outperforms existing methods across multiple benchmarks, particularly surpassing the second-best traditional expert on SynthScars by 3.31% in mIoU and 7.75% in F1 score. Moreover, the refined images generated under its guidance exhibit stronger alignment with human preferences. The code, model, and dataset will be released.

Problem

Research questions and friction points this paper is trying to address.

Lack of artifact-level interpretability in synthetic image detection

Outdated generators and insufficient annotations in current datasets

Need for improved image refinement guided by detection and explanation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces SynthScars dataset with expert annotations

Proposes LEGION framework for image forgery analysis

Integrates LEGION into image refinement pipelines

🔎 Similar Papers

No similar papers found.