Z-Erase: Enabling Concept Erasure in Single-Stream Diffusion Transformers

📅 2026-03-26
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Z-Erase, the first concept erasure framework tailored for single-stream diffusion Transformers, addressing the challenge that existing methods often induce generation collapse and fail to effectively remove undesirable concepts. Z-Erase decouples the update mechanisms of textual and visual streams and introduces a Lagrangian-guided adaptive erasure modulation strategy, enabling precise removal of target concepts while preserving generation stability and high image quality. Theoretical analysis demonstrates that the proposed approach converges to a Pareto-stable equilibrium. Extensive experiments show that Z-Erase achieves state-of-the-art erasure performance across multiple tasks without compromising the fidelity of generated images.

Technology Category

Application Category

📝 Abstract
Concept erasure serves as a vital safety mechanism for removing unwanted concepts from text-to-image (T2I) models. While extensively studied in U-Net and dual-stream architectures (e.g., Flux), this task remains under-explored in the recent emerging paradigm of single-stream diffusion transformers (e.g., Z-Image). In this new paradigm, text and image tokens are processed as a single unified sequence via shared parameters. Consequently, directly applying prior erasure methods typically leads to generation collapse. To bridge this gap, we introduce Z-Erase, the first concept erasure method tailored for single-stream T2I models. To guarantee stable image generation, Z-Erase first proposes a Stream Disentangled Concept Erasure Framework that decouples updates and enables existing methods on single-stream models. Subsequently, within this framework, we introduce Lagrangian-Guided Adaptive Erasure Modulation, a constrained algorithm that further balances the sensitive erasure-preservation trade-off. Moreover, we provide a rigorous convergence analysis proving that Z-Erase can converge to a Pareto stationary point. Experiments demonstrate that Z-Erase successfully overcomes the generation collapse issue, achieving state-of-the-art performance across a wide range of tasks.
Problem

Research questions and friction points this paper is trying to address.

concept erasure
single-stream diffusion transformers
text-to-image generation
generation collapse
Innovation

Methods, ideas, or system contributions that make the work stand out.

concept erasure
single-stream diffusion transformer
generation collapse
adaptive erasure modulation
Pareto stationary point
🔎 Similar Papers
N
Nanxiang Jiang
Beihang University
Z
Zhaoxin Fan
Beihang University
Baisen Wang
Baisen Wang
Institute of Information Engineering, Chinese Academy of Sciences
AIGCMusic Generation
Daiheng Gao
Daiheng Gao
DINQ
AIGC
J
Junhang Cheng
Beihang University
J
Jifeng Guo
Beihang University
Y
Yalan Qin
Shanghai University
Yeying Jin
Yeying Jin
Tencent | National University of Singapore
Computer VisionAIGCGenAIMLLMVLM
Hongwei Zheng
Hongwei Zheng
Shanghai Jiao Tong University
čŽĄįŽ—æœēč§†č§‰ã€č”é‚Ļå­Ļäš 
F
Faguo Wu
Beihang University
W
Wenjun Wu
Beihang University