ClueAegis: Heuristic-to-Reasoning Cognitive-skill Learning for Unified Evidence-based Synthetic Image Detection

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing synthetic image detection methods, which predominantly rely on end-to-end classification or single-mode reasoning and struggle to model structured forensic reasoning and heterogeneous visual evidence. To overcome these challenges, the authors propose a cognition-inspired multi-skill reasoning framework that decomposes detection into an explicit, configurable sequence of cognitive skills: first extracting perceptual clues, then selecting optimal forensic skills, and finally performing evidence extraction and decision-making through a skill-guided toolchain. Built upon a two-stage agent architecture integrating clue-driven heuristics and evidence-guided reasoning, the study also introduces ClueAegis-Bench, a new evaluation benchmark. Experiments demonstrate that the proposed approach achieves state-of-the-art performance across multiple metrics, significantly enhancing cross-domain generalization and robustness while yielding interpretable reasoning trajectories and structured forensic evidence.
📝 Abstract
The rapid advancement of generative models has made synthetic images increasingly realistic, challenging reliable detection. Existing methods are often limited to end-to-end classification or monolithic reasoning, and thus fail to model structured forensic reasoning and heterogeneous visual evidence. We revisit synthetic image detection from a cognitive perspective and propose a \textit{Heuristic-to-Reasoning} cognitive skill learning framework for evidence-based forensic analysis. Given an input image, our framework first extracts heuristic perceptual clues, selects the optimal forensic skill, and then performs skill-conditioned reasoning for evidence extraction and decision making. To support this paradigm, we introduce \textbf{ClueAegis-Bench}, which decomposes synthetic image detection into explicitly annotated forensic cognitive skills for structured evaluation beyond binary classification. Based on this benchmark, we propose \textbf{ClueAegis} (\underline{C}ognitive-skill \underline{L}earning for \underline{U}nified \underline{E}vidence-based Synthetic Image Detection), a two-stage agentic framework that conducts heuristic skill selection followed by evidence-guided reasoning through skill-conditioned toolchains. This design reformulates synthetic image detection as a configurable multi-skill reasoning process that bridges perception, skill selection, and forensic reasoning. Extensive experiments show that ClueAegis achieves state-of-the-art performance while improving cross-domain generalization and robustness. It also provides transparent reasoning trajectories and structured forensic evidence, offering a more explainable alternative to conventional end-to-end detectors.
Problem

Research questions and friction points this paper is trying to address.

synthetic image detection
forensic reasoning
visual evidence
cognitive skill
generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heuristic-to-Reasoning
Cognitive-skill Learning
Evidence-based Detection
Synthetic Image Forensics
Skill-conditioned Reasoning