🤖 AI Summary
Image manipulation localization (IML) faces a fundamental trade-off: supervised methods rely on costly pixel-level annotations, while weakly or unsupervised approaches suffer from limited accuracy and poor interpretability. To address this, we propose the first training-free contextual forensic chain framework—the first to integrate multimodal large language models (MLLMs) into zero-shot image forensics. Our method constructs an object-centric rule base, designs a multi-stage progressive reasoning chain that emulates expert cognitive processes, and incorporates an adaptive filtering mechanism to enable end-to-end, interpretable analysis—from image-level classification to pixel-level localization. Crucially, it requires no fine-tuning and demonstrates strong cross-dataset generalization. On multiple benchmarks, it significantly outperforms existing zero-shot methods and matches or surpasses several weakly supervised and even fully supervised baselines. The framework unifies outputs into three complementary modalities: pixel-level localization maps, image-level binary decisions, and text-based attribution explanations.
📝 Abstract
Advances in image tampering pose serious security threats, underscoring the need for effective image manipulation localization (IML). While supervised IML achieves strong performance, it depends on costly pixel-level annotations. Existing weakly supervised or training-free alternatives often underperform and lack interpretability. We propose the In-Context Forensic Chain (ICFC), a training-free framework that leverages multi-modal large language models (MLLMs) for interpretable IML tasks. ICFC integrates an objectified rule construction with adaptive filtering to build a reliable knowledge base and a multi-step progressive reasoning pipeline that mirrors expert forensic workflows from coarse proposals to fine-grained forensics results. This design enables systematic exploitation of MLLM reasoning for image-level classification, pixel-level localization, and text-level interpretability. Across multiple benchmarks, ICFC not only surpasses state-of-the-art training-free methods but also achieves competitive or superior performance compared to weakly and fully supervised approaches.