Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the zero-shot harmful concept removal problem in text-to-image diffusion models, proposing a training-free, pre-generation semantic surgery framework. Methodologically, it dynamically rectifies semantic vectors in the text embedding space: a co-occurrence encoding module models concept associations; a visual feedback loop enables iterative calibration; and dynamic existence estimation coupled with calibrated subtraction precisely suppresses target concepts. It is the first approach to embed zero-shot erasure within an intrinsic threat-aware system, enabling robust multi-concept removal and residual suppression. Experiments demonstrate substantial improvements over state-of-the-art methods across object, explicit harmful content, and artistic style removal tasks: object erasure achieves an H-score of 93.58; only one instance of explicit harmful content remains; style erasure attains Hₐ = 8.09; and image fidelity is fully preserved.

Technology Category

Application Category

📝 Abstract
Concept erasure in text-to-image diffusion models is crucial for mitigating harmful content, yet existing methods often compromise generative quality. We introduce Semantic Surgery, a novel training-free, zero-shot framework for concept erasure that operates directly on text embeddings before the diffusion process. It dynamically estimates the presence of target concepts in a prompt and performs a calibrated vector subtraction to neutralize their influence at the source, enhancing both erasure completeness and locality. The framework includes a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence. As a training-free method, Semantic Surgery adapts dynamically to each prompt, ensuring precise interventions. Extensive experiments on object, explicit content, artistic style, and multi-celebrity erasure tasks show our method significantly outperforms state-of-the-art approaches. We achieve superior completeness and robustness while preserving locality and image quality (e.g., 93.58 H-score in object erasure, reducing explicit content to just 1 instance, and 8.09 H_a in style erasure with no quality degradation). This robustness also allows our framework to function as a built-in threat detection system, offering a practical solution for safer text-to-image generation.
Problem

Research questions and friction points this paper is trying to address.

Eliminating harmful concepts from diffusion models without compromising image quality
Developing training-free zero-shot framework for precise concept removal in prompts
Enhancing erasure completeness and locality while maintaining generation integrity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free zero-shot framework for concept erasure
Dynamic vector subtraction on text embeddings
Co-occurrence encoding for multi-concept erasure