Towards SFW sampling for diffusion models via external conditioning

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address safety risks associated with diffusion models generating NSFW content (e.g., violence, non-consensual nudity), this paper proposes a fine-tuning-free, model-agnostic external conditional sampling framework. The method constructs external sensitive semantic constraints using CLIP and multimodal models, and introduces a conditional trajectory correction mechanism that dynamically suppresses latent-space evolution toward NSFW directions during sampling. Crucially, it enables user-defined sensitive categories without accessing or modifying the generative model’s internal parameters or knowledge. Experiments on Stable Diffusion demonstrate a substantial reduction in NSFW generation rates—comparable to supervised fine-tuning—while preserving image fidelity, diversity, and computational efficiency. To our knowledge, this is the first external guidance framework achieving dynamic, generalizable, zero-shot NSFW control.

Technology Category

Application Category

📝 Abstract
Score-based generative models (SBM), also known as diffusion models, are the de facto state of the art for image synthesis. Despite their unparalleled performance, SBMs have recently been in the spotlight for being tricked into creating not-safe-for-work (NSFW) content, such as violent images and non-consensual nudity. Current approaches that prevent unsafe generation are based on the models' own knowledge, and the majority of them require fine-tuning. This article explores the use of external sources for ensuring safe outputs in SBMs. Our safe-for-work (SFW) sampler implements a Conditional Trajectory Correction step that guides the samples away from undesired regions in the ambient space using multimodal models as the source of conditioning. Furthermore, using Contrastive Language Image Pre-training (CLIP), our method admits user-defined NSFW classes, which can vary in different settings. Our experiments on the text-to-image SBM Stable Diffusion validate that the proposed SFW sampler effectively reduces the generation of explicit content while being competitive with other fine-tuning-based approaches, as assessed via independent NSFW detectors. Moreover, we evaluate the impact of the SFW sampler on image quality and show that the proposed correction scheme comes at a minor cost with negligible effect on samples not needing correction. Our study confirms the suitability of the SFW sampler towards aligned SBM models and the potential of using model-agnostic conditioning for the prevention of unwanted images.
Problem

Research questions and friction points this paper is trying to address.

Preventing NSFW content generation in diffusion models
Using external conditioning for safe image synthesis
Maintaining image quality while reducing explicit content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses external conditioning for safe outputs
Implements Conditional Trajectory Correction step
Employs CLIP for user-defined NSFW classes
🔎 Similar Papers
No similar papers found.
C
Camilo Carvajal Reyes
Department of Mathematics, Imperial College, London, UK
J
Joaqu'in Fontbona
Departamento de Ingenier'ia Matem'atica, Universidad de Chile, Santiago, Chile
Felipe Tobar
Felipe Tobar
Imperial College London
Signal ProcessingMachine Learning