EDGE-Shield: Efficient Denoising-staGE Shield for Violative Content Filtering via Scalable Reference-Based Matching

📅 2026-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reference-based approaches for harmful content filtering suffer from poor scalability in large-scale settings and rely on full image generation, making them ill-suited for real-time applications. This work proposes a novel mechanism that embeds efficient reference matching directly within the denoising process. By applying an $x$-pred transformation, intermediate noisy latents are mapped to pseudo-clean latents, enabling early and low-latency content identification and blocking. The method requires no additional training and performs real-time reference comparison in the embedding space, achieving high accuracy and compatibility across diverse generative models. Experimental results demonstrate that the approach reduces processing time by approximately 79% on Z-Image-Turbo and 50% on Qwen-Image while maintaining strong filtering performance.
📝 Abstract
The advent of Text-to-Image generative models poses significant risks of copyright violation and deepfake generation. Since the rapid proliferation of new copyrighted works and private individuals constantly emerges, reference-based training-free content filters are essential for providing up-to-date protection without the constraints of a fixed knowledge cutoff. However, existing reference-based approaches often lack scalability when handling numerous references and require waiting for finishing image generation. To solve these problems, we propose EDGE-Shield, a scalable content filter during the denoising process that maintains practical latency while effectively blocking violative content. We leverage embedding-based matching for efficient reference comparison. Additionally, we introduce an \textit{$x$}-pred transformation that converts the model's noisy intermediate latent into the pseudo-estimated clean latent at the later stage, enhancing classification accuracy of violative content at earlier denoising stages. We conduct experiments of violative content filtering against two generative models including Z-Image-Turbo and Qwen-Image. EDGE-Shield significantly outperforms traditional reference-based methods in terms of latency; it achieves an approximate $79\%$ reduction in processing time for Z-Image-Turbo and approximate $50\%$ reduction for Qwen-Image, maintaining the filtering accuracy across different model architectures.
Problem

Research questions and friction points this paper is trying to address.

violative content filtering
reference-based matching
scalability
latency
text-to-image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

reference-based matching
denoising-stage filtering
embedding-based matching
x-pred transformation
scalable content moderation
🔎 Similar Papers
No similar papers found.