NDM: A Noise-driven Detection and Mitigation Framework against Implicit Sexual Intentions in Text-to-Image Generation

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image diffusion models are vulnerable to implicit sexual prompts, generating inappropriate content; existing detection methods struggle to identify such obfuscated prompts, while fine-tuning–based mitigation often degrades generation quality. To address this, we propose a noise-driven framework for implicit sexual content detection and mitigation. Our method leverages the separability of noise predictions during early denoising steps to detect implicit malicious intent—introducing the first approach to exploit noise dynamics for such detection. We further design a noise-augmented adaptive negative prompting mechanism, integrated with attention suppression and intermediate noise separation. Evaluated on both natural and adversarial benchmarks, our framework significantly outperforms state-of-the-art baselines—including SLD, UCE, and RECE—achieving high-precision detection and effective suppression of implicit sexual content without compromising generation fidelity.

Technology Category

Application Category

📝 Abstract
Despite the impressive generative capabilities of text-to-image (T2I) diffusion models, they remain vulnerable to generating inappropriate content, especially when confronted with implicit sexual prompts. Unlike explicit harmful prompts, these subtle cues, often disguised as seemingly benign terms, can unexpectedly trigger sexual content due to underlying model biases, raising significant ethical concerns. However, existing detection methods are primarily designed to identify explicit sexual content and therefore struggle to detect these implicit cues. Fine-tuning approaches, while effective to some extent, risk degrading the model's generative quality, creating an undesirable trade-off. To address this, we propose NDM, the first noise-driven detection and mitigation framework, which could detect and mitigate implicit malicious intention in T2I generation while preserving the model's original generative capabilities. Specifically, we introduce two key innovations: first, we leverage the separability of early-stage predicted noise to develop a noise-based detection method that could identify malicious content with high accuracy and efficiency; second, we propose a noise-enhanced adaptive negative guidance mechanism that could optimize the initial noise by suppressing the prominent region's attention, thereby enhancing the effectiveness of adaptive negative guidance for sexual mitigation. Experimentally, we validate NDM on both natural and adversarial datasets, demonstrating its superior performance over existing SOTA methods, including SLD, UCE, and RECE, etc. Code and resources are available at https://github.com/lorraine021/NDM.
Problem

Research questions and friction points this paper is trying to address.

Detecting implicit sexual intentions in text-to-image generation
Mitigating inappropriate content without degrading generative quality
Addressing model biases triggered by seemingly benign prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise-driven framework detects implicit sexual intentions
Noise-based method identifies malicious content accurately
Adaptive negative guidance mechanism suppresses sexual content generation
🔎 Similar Papers
No similar papers found.