EfficientIML: Efficient High-Resolution Image Manipulation Localization

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of low sensitivity to diffusion-generated forgeries, high computational cost, and difficulty in jointly capturing global and local cues in high-resolution image forgery localization, this paper proposes an efficient three-stage lightweight architecture. Its key contributions are: (1) constructing the first large-scale (thousand-level) high-resolution Synthetic Image Forgery (SIF) dataset, specifically covering diffusion-model-generated forgeries; (2) designing EfficientRWKV—a parallel hybrid state-space network integrating state-space modeling with lightweight attention—to enable synergistic global-local feature capture; and (3) introducing a multi-scale supervision strategy to enhance hierarchical prediction consistency. Evaluated on both our proprietary and standard benchmarks, the method achieves state-of-the-art performance, significantly outperforming lightweight ViT-based models in localization accuracy, computational efficiency (42% fewer FLOPs), and inference speed (32 FPS at 4K resolution), demonstrating strong suitability for real-time digital forensics.

Technology Category

Application Category

📝 Abstract
With imaging devices delivering ever-higher resolutions and the emerging diffusion-based forgery methods, current detectors trained only on traditional datasets (with splicing, copy-moving and object removal forgeries) lack exposure to this new manipulation type. To address this, we propose a novel high-resolution SIF dataset of 1200+ diffusion-generated manipulations with semantically extracted masks. However, this also imposes a challenge on existing methods, as they face significant computational resource constraints due to their prohibitive computational complexities. Therefore, we propose a novel EfficientIML model with a lightweight, three-stage EfficientRWKV backbone. EfficientRWKV's hybrid state-space and attention network captures global context and local details in parallel, while a multi-scale supervision strategy enforces consistency across hierarchical predictions. Extensive evaluations on our dataset and standard benchmarks demonstrate that our approach outperforms ViT-based and other SOTA lightweight baselines in localization performance, FLOPs and inference speed, underscoring its suitability for real-time forensic applications.
Problem

Research questions and friction points this paper is trying to address.

Detecting diffusion-based image forgeries in high-resolution images
Addressing computational constraints of existing localization methods
Improving real-time performance for forensic applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight EfficientRWKV backbone with hybrid architecture
Multi-scale supervision for hierarchical prediction consistency
Novel high-resolution SIF dataset with diffusion manipulations
🔎 Similar Papers
No similar papers found.
Jinhan Li
Jinhan Li
Undergraduate Student, New York University
H
Haoyang He
Zhejiang University, Hangzhou, China
L
Lei Xie
Zhejiang University, Hangzhou, China
J
Jiangning Zhang
Zhejiang University, Hangzhou, China