Omni-IML: Towards Unified Image Manipulation Localization

📅 2024-11-22

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing image manipulation localization (IML) methods are largely task-specific, suffering from severe performance degradation under joint training and poor cross-task generalization. To address this, we propose the first unified multi-task IML framework. Our approach introduces a modality-gated encoder for cross-modal adaptive representation learning, a dynamic-weight decoder for task-aware localization, and a box-supervision-driven anomaly enhancement module to improve fine-grained discriminability. Furthermore, we release Omni-273k—the first large-scale IML benchmark annotated with natural-language manipulation descriptions—generated via chain-of-thought automatic labeling and augmented with a lightweight explanation module. Evaluated on four major IML tasks, our single model achieves state-of-the-art performance across all, while significantly improving cross-task generalization and semantic interpretability.

Technology Category

Application Category

📝 Abstract

Existing Image Manipulation Localization (IML) methods mostly rely heavily on task-specific designs, making them perform well only on the target IML task, while joint training on multiple IML tasks causes significant performance degradation, hindering real applications. To this end, we propose Omni-IML, the first generalist model designed to unify IML across diverse tasks. Specifically, Omni-IML achieves generalization through three key components: (1) a Modal Gate Encoder, which adaptively selects the optimal encoding modality per sample, (2) a Dynamic Weight Decoder, which dynamically adjusts decoder filters to the task at hand, and (3) an Anomaly Enhancement module that leverages box supervision to highlight the tampered regions and facilitate the learning of task-agnostic features. Beyond localization, to support interpretation of the tampered images, we construct Omni-273k, a large high-quality dataset that includes natural language descriptions of tampered artifact. It is annotated through our automatic, chain-of-thoughts annotation technique. We also design a simple-yet-effective interpretation module to better utilize these descriptive annotations. Our extensive experiments show that our single Omni-IML model achieves state-of-the-art performance across all four major IML tasks, providing a valuable solution for practical deployment and a promising direction of generalist models in image forensics. Our code and dataset will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Unifying diverse Image Manipulation Localization tasks effectively

Overcoming performance degradation in joint multi-task IML training

Providing interpretable tampered image descriptions for forensic analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modal Gate Encoder adaptively selects encoding modality

Dynamic Weight Decoder adjusts filters dynamically

Anomaly Enhancement highlights tampered regions effectively

🔎 Similar Papers

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization