EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of effectively erasing undesirable concepts from flow-matching-based text-to-image and text-to-video diffusion models built upon Transformer architectures, without compromising generation quality. The authors formulate concept erasure as a constrained multi-objective optimization problem that balances removal efficacy with generative fidelity. They propose a novel forgetting strategy integrating implicit gradient surgery, LoRA-based efficient fine-tuning, and attention regularization, complemented by an anchor-propagation mechanism to consistently propagate erasure effects across spatial and temporal dimensions. As the first unified framework supporting both image and video diffusion models, the method achieves state-of-the-art performance across multiple benchmarks, significantly outperforming existing approaches in erasure effectiveness, generation fidelity, and temporal consistency.

Technology Category

Application Category

📝 Abstract
Removing undesired concepts from large-scale text-to-image (T2I) and text-to-video (T2V) diffusion models while preserving overall generative quality remains a major challenge, particularly as modern models such as Stable Diffusion v3, Flux, and OpenSora employ flow-matching and transformer-based architectures and extend to long-horizon video generation. Existing concept erasure methods, designed for earlier T2I/T2V models, often fail to generalize to these paradigms. To address this issue, we propose EraseAnything++, a unified framework for concept erasure in both image and video diffusion models with flow-matching objectives. Central to our approach is formulating concept erasure as a constrained multi-objective optimization problem that explicitly balances concept removal with preservation of generative utility. To solve the resulting conflicting objectives, we introduce an efficient utility-preserving unlearning strategy based on implicit gradient surgery. Furthermore, by integrating LoRA-based parameter tuning with attention-level regularization, our method anchors erasure on key visual representations and propagates it consistently across spatial and temporal dimensions. In the video setting, we further enhance consistency through an anchor-and-propagate mechanism that initializes erasure on reference frames and enforces it throughout subsequent transformer layers, thereby mitigating temporal drift. Extensive experiments on both image and video benchmarks demonstrate that EraseAnything++ substantially outperforms prior methods in erasure effectiveness, generative fidelity, and temporal consistency, establishing a new state of the art for concept erasure in next-generation diffusion models.
Problem

Research questions and friction points this paper is trying to address.

concept erasure
diffusion models
flow-matching
text-to-video
generative quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

concept erasure
multi-objective optimization
flow-matching
implicit gradient surgery
temporal consistency
Z
Zhaoxin Fan
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University
N
Nanxiang Jiang
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University
Daiheng Gao
Daiheng Gao
DINQ
AIGC
Shiji Zhou
Shiji Zhou
Associate Professor, Beihang University
Online LearningStochastic OptimizationMulti-Objective OptimizationMulti-task Learning
W
Wenjun Wu
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, School of Artificial Intelligence, Beihang University