🤖 AI Summary
To address key bottlenecks in image manipulation localization (IML)—namely, low resolution of real-world web images, noisy manipulation masks, and scarcity of high-quality annotations—this paper reformulates manipulation detection as a cross-scale change detection task. We propose MMM, an end-to-end super-resolution-enhanced mask generation framework that jointly integrates super-resolution reconstruction, dual-image feature alignment, channel-wise concatenation modeling, and a self-supervised change-aware structural module. Leveraging this framework, we construct MMMD, the first large-scale, real-world manipulation mask dataset covering diverse Photoshop-based editing operations. Experiments demonstrate that MMMD substantially improves downstream IML model performance: generated masks achieve an 8.2 dB PSNR gain and exhibit enhanced diversity in manipulations and precision in localization.
📝 Abstract
In the field of image manipulation localization (IML), the small quantity and poor quality of existing datasets have always been major issues. A dataset containing various types of manipulations will greatly help improve the accuracy of IML models. Images on the internet (such as those on Baidu Tieba's PS Bar) are manipulated using various techniques, and creating a dataset from these images will significantly enrich the types of manipulations in our data. However, images on the internet suffer from resolution and clarity issues, and the masks obtained by simply subtracting the manipulated image from the original contain various noises. These noises are difficult to remove, rendering the masks unusable for IML models. Inspired by the field of change detection, we treat the original and manipulated images as changes over time for the same image and view the data generation task as a change detection task. However, due to clarity issues between images, conventional change detection models perform poorly. Therefore, we introduced a super-resolution module and proposed the Manipulation Mask Manufacturer (MMM) framework. It enhances the resolution of both the original and tampered images, thereby improving image details for better comparison. Simultaneously, the framework converts the original and tampered images into feature embeddings and concatenates them, effectively modeling the context. Additionally, we created the Manipulation Mask Manufacturer Dataset (MMMD), a dataset that covers a wide range of manipulation techniques. We aim to contribute to the fields of image forensics and manipulation detection by providing more realistic manipulation data through MMM and MMMD. Detailed information about MMMD and the download link can be found at: the code and datasets will be made available.