Semantic Surgery: Zero-Shot Concept Erasure in Diffusion Models

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

This work addresses the zero-shot harmful concept removal problem in text-to-image diffusion models, proposing a training-free, pre-generation semantic surgery framework. Methodologically, it dynamically rectifies semantic vectors in the text embedding space: a co-occurrence encoding module models concept associations; a visual feedback loop enables iterative calibration; and dynamic existence estimation coupled with calibrated subtraction precisely suppresses target concepts. It is the first approach to embed zero-shot erasure within an intrinsic threat-aware system, enabling robust multi-concept removal and residual suppression. Experiments demonstrate substantial improvements over state-of-the-art methods across object, explicit harmful content, and artistic style removal tasks: object erasure achieves an H-score of 93.58; only one instance of explicit harmful content remains; style erasure attains Hₐ = 8.09; and image fidelity is fully preserved.

Technology Category

Application Category

📝 Abstract

Concept erasure in text-to-image diffusion models is crucial for mitigating harmful content, yet existing methods often compromise generative quality. We introduce Semantic Surgery, a novel training-free, zero-shot framework for concept erasure that operates directly on text embeddings before the diffusion process. It dynamically estimates the presence of target concepts in a prompt and performs a calibrated vector subtraction to neutralize their influence at the source, enhancing both erasure completeness and locality. The framework includes a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence. As a training-free method, Semantic Surgery adapts dynamically to each prompt, ensuring precise interventions. Extensive experiments on object, explicit content, artistic style, and multi-celebrity erasure tasks show our method significantly outperforms state-of-the-art approaches. We achieve superior completeness and robustness while preserving locality and image quality (e.g., 93.58 H-score in object erasure, reducing explicit content to just 1 instance, and 8.09 H_a in style erasure with no quality degradation). This robustness also allows our framework to function as a built-in threat detection system, offering a practical solution for safer text-to-image generation.

Problem

Research questions and friction points this paper is trying to address.

Eliminating harmful concepts from diffusion models without compromising image quality

Developing training-free zero-shot framework for precise concept removal in prompts

Enhancing erasure completeness and locality while maintaining generation integrity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free zero-shot framework for concept erasure

Dynamic vector subtraction on text embeddings

Co-occurrence encoding for multi-concept erasure

🔎 Similar Papers

Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via Learnable Prompts