EDGE-Shield: Efficient Denoising-staGE Shield for Violative Content Filtering via Scalable Reference-Based Matching

📅 2026-04-04

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing reference-based approaches for harmful content filtering suffer from poor scalability in large-scale settings and rely on full image generation, making them ill-suited for real-time applications. This work proposes a novel mechanism that embeds efficient reference matching directly within the denoising process. By applying an $x$-pred transformation, intermediate noisy latents are mapped to pseudo-clean latents, enabling early and low-latency content identification and blocking. The method requires no additional training and performs real-time reference comparison in the embedding space, achieving high accuracy and compatibility across diverse generative models. Experimental results demonstrate that the approach reduces processing time by approximately 79% on Z-Image-Turbo and 50% on Qwen-Image while maintaining strong filtering performance.

Technology Category

Application Category

📝 Abstract

The advent of Text-to-Image generative models poses significant risks of copyright violation and deepfake generation. Since the rapid proliferation of new copyrighted works and private individuals constantly emerges, reference-based training-free content filters are essential for providing up-to-date protection without the constraints of a fixed knowledge cutoff. However, existing reference-based approaches often lack scalability when handling numerous references and require waiting for finishing image generation. To solve these problems, we propose EDGE-Shield, a scalable content filter during the denoising process that maintains practical latency while effectively blocking violative content. We leverage embedding-based matching for efficient reference comparison. Additionally, we introduce an \textit{$x$}-pred transformation that converts the model's noisy intermediate latent into the pseudo-estimated clean latent at the later stage, enhancing classification accuracy of violative content at earlier denoising stages. We conduct experiments of violative content filtering against two generative models including Z-Image-Turbo and Qwen-Image. EDGE-Shield significantly outperforms traditional reference-based methods in terms of latency; it achieves an approximate $79\%$ reduction in processing time for Z-Image-Turbo and approximate $50\%$ reduction for Qwen-Image, maintaining the filtering accuracy across different model architectures.

Problem

Research questions and friction points this paper is trying to address.

violative content filtering

reference-based matching

scalability

latency

text-to-image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

reference-based matching

denoising-stage filtering

embedding-based matching