EfficientIML: Efficient High-Resolution Image Manipulation Localization

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

To address the challenges of low sensitivity to diffusion-generated forgeries, high computational cost, and difficulty in jointly capturing global and local cues in high-resolution image forgery localization, this paper proposes an efficient three-stage lightweight architecture. Its key contributions are: (1) constructing the first large-scale (thousand-level) high-resolution Synthetic Image Forgery (SIF) dataset, specifically covering diffusion-model-generated forgeries; (2) designing EfficientRWKV—a parallel hybrid state-space network integrating state-space modeling with lightweight attention—to enable synergistic global-local feature capture; and (3) introducing a multi-scale supervision strategy to enhance hierarchical prediction consistency. Evaluated on both our proprietary and standard benchmarks, the method achieves state-of-the-art performance, significantly outperforming lightweight ViT-based models in localization accuracy, computational efficiency (42% fewer FLOPs), and inference speed (32 FPS at 4K resolution), demonstrating strong suitability for real-time digital forensics.

Technology Category

Application Category

📝 Abstract

With imaging devices delivering ever-higher resolutions and the emerging diffusion-based forgery methods, current detectors trained only on traditional datasets (with splicing, copy-moving and object removal forgeries) lack exposure to this new manipulation type. To address this, we propose a novel high-resolution SIF dataset of 1200+ diffusion-generated manipulations with semantically extracted masks. However, this also imposes a challenge on existing methods, as they face significant computational resource constraints due to their prohibitive computational complexities. Therefore, we propose a novel EfficientIML model with a lightweight, three-stage EfficientRWKV backbone. EfficientRWKV's hybrid state-space and attention network captures global context and local details in parallel, while a multi-scale supervision strategy enforces consistency across hierarchical predictions. Extensive evaluations on our dataset and standard benchmarks demonstrate that our approach outperforms ViT-based and other SOTA lightweight baselines in localization performance, FLOPs and inference speed, underscoring its suitability for real-time forensic applications.

Problem

Research questions and friction points this paper is trying to address.

Detecting diffusion-based image forgeries in high-resolution images

Addressing computational constraints of existing localization methods

Improving real-time performance for forensic applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight EfficientRWKV backbone with hybrid architecture

Multi-scale supervision for hierarchical prediction consistency

Novel high-resolution SIF dataset with diffusion manipulations

🔎 Similar Papers

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization