SAGE: Exploring the Boundaries of Unsafe Concept Domain with Semantic-Augment Erasing

πŸ“… 2025-06-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Pretrained diffusion models (DMs) pose safety and copyright risks due to implicit sensitive concepts encoded in their representations; existing word-level erasure methods suffer from poor generalization, trapped in the β€œlexical concept pit.” This paper proposes a semantic-enhanced erasure framework: first, it introduces a concept-domain boundary exploration paradigm that lifts erasure from discrete token spaces into continuous semantic embedding spaces; second, it designs a cyclic self-supervised erasure mechanism that jointly enforces global semantic alignment and preserves local noise structure, enabling broad suppression of unsafe concepts while maintaining high-fidelity reconstruction of irrelevant semantics. Evaluated across multiple benchmarks, our method achieves state-of-the-art performance in safety compliance, generation fidelity, and cross-concept generalizability. Code and pretrained model weights are publicly released.

Technology Category

Application Category

πŸ“ Abstract
Diffusion models (DMs) have achieved significant progress in text-to-image generation. However, the inevitable inclusion of sensitive information during pre-training poses safety risks, such as unsafe content generation and copyright infringement. Concept erasing finetunes weights to unlearn undesirable concepts, and has emerged as a promising solution. However, existing methods treat unsafe concept as a fixed word and repeatedly erase it, trapping DMs in ``word concept abyss'', which prevents generalized concept-related erasing. To escape this abyss, we introduce semantic-augment erasing which transforms concept word erasure into concept domain erasure by the cyclic self-check and self-erasure. It efficiently explores and unlearns the boundary representation of concept domain through semantic spatial relationships between original and training DMs, without requiring additional preprocessed data. Meanwhile, to mitigate the retention degradation of irrelevant concepts while erasing unsafe concepts, we further propose the global-local collaborative retention mechanism that combines global semantic relationship alignment with local predicted noise preservation, effectively expanding the retentive receptive field for irrelevant concepts. We name our method SAGE, and extensive experiments demonstrate the comprehensive superiority of SAGE compared with other methods in the safe generation of DMs. The code and weights will be open-sourced at https://github.com/KevinLight831/SAGE.
Problem

Research questions and friction points this paper is trying to address.

Eliminating unsafe content in diffusion models
Escaping the word concept abyss issue
Preserving irrelevant concepts during erasure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-augment erasing transforms word to domain erasure
Global-local retention mechanism preserves irrelevant concepts
Cyclic self-check explores boundary of unsafe concepts
πŸ”Ž Similar Papers
No similar papers found.
H
Hongguang Zhu
Faculty of Data Science, City University of Macau
Yunchao Wei
Yunchao Wei
Professor, Beijing Jiaotong University, UTS, UIUC, NUS
Computer VisionMachine Learning
M
Mengyu Wang
Institute of Information Science, Beijing Jiaotong University, and Beijing Key Laboratory of Advanced Information Science and Network Technology
Siyu Jiao
Siyu Jiao
Beijing Jiaotong University
Vision&LanguageSegmentation
Y
Yan Fang
Institute of Information Science, Beijing Jiaotong University, and Beijing Key Laboratory of Advanced Information Science and Network Technology
J
Jiannan Huang
Institute of Information Science, Beijing Jiaotong University, and Beijing Key Laboratory of Advanced Information Science and Network Technology
Y
Yao Zhao
Institute of Information Science, Beijing Jiaotong University, and Beijing Key Laboratory of Advanced Information Science and Network Technology