🤖 AI Summary
This work addresses the limitations of existing image manipulation localization methods, which primarily rely on low-level artifact detection and struggle to identify subtle yet semantically critical forgeries. To bridge this gap, we introduce a novel task—semantic manipulation localization—and present the first fine-grained benchmark dataset tailored for this purpose. We further propose TRACE, an end-to-end framework that integrates semantic anchoring, frequency-domain perturbation awareness, and joint reasoning over semantic content and spatial extent to precisely localize manipulated regions even under high semantic consistency. Experiments demonstrate that TRACE significantly outperforms current approaches on our benchmark, yielding more complete, compact, and semantically coherent localization results. This study underscores the pivotal role of semantic awareness in image forensics and establishes a new paradigm for semantics-driven manipulation localization.
📝 Abstract
Image Manipulation Localization (IML) aims to identify edited regions in an image. However, with the increasing use of modern image editing and generative models, many manipulations no longer exhibit obvious low-level artifacts. Instead, they often involve subtle but meaning-altering edits to an object's attributes, state, or relationships while remaining highly consistent with the surrounding content. This makes conventional IML methods less effective because they mainly rely on artifact detection rather than semantic sensitivity. To address this issue, we introduce Semantic Manipulation Localization (SML), a new task that focuses on localizing subtle semantic edits that significantly change image interpretation. We further construct a dedicated fine-grained benchmark for SML using a semantics-driven manipulation pipeline with pixel-level annotations. Based on this task, we propose TRACE (Targeted Reasoning of Attributed Cognitive Edits), an end-to-end framework that models semantic sensitivity through three progressively coupled components: semantic anchoring, semantic perturbation sensing, and semantic-constrained reasoning. Specifically, TRACE first identifies semantically meaningful regions that support image understanding, then injects perturbation-sensitive frequency cues to capture subtle edits under strong visual consistency, and finally verifies candidate regions through joint reasoning over semantic content and semantic scope. Extensive experiments show that TRACE consistently outperforms existing IML methods on our benchmark and produces more complete, compact, and semantically coherent localization results. These results demonstrate the necessity of moving beyond artifact-based localization and provide a new direction for image forensics in complex semantic editing scenarios.