FlowErase-RL: Rethinking Concept Erasure as Reward Optimization in Flow Matching Models

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the safety risks of harmful content generation in flow-matching models by reframing concept erasure as a reward optimization problem. It introduces, for the first time, the GRPO reinforcement learning framework, equipped with a dynamic dual-path reward mechanism that jointly optimizes suppression of target concepts and fidelity of non-target generation, alongside a performance-driven adaptive switching strategy that enables stable training without explicit supervision. The proposed method achieves state-of-the-art results across multiple concept erasure tasks—including nudity, object removal, and artistic style elimination—significantly outperforming existing approaches. It demonstrates strong robustness and scalability while preserving high-quality image synthesis and semantic alignment.

📝 Abstract

Recent advances in flow matching models have significantly improved text-to-image generation quality, but also introduce growing safety risks due to the generation of harmful or undesirable content. Existing concept erasure methods are either inference-time interventions with limited effectiveness or rely on supervised fine-tuning (SFT), which requires precisely aligned data and struggles with scalability and multi-concept settings. In this paper, we propose \emph{FlowErase-RL}, the first GRPO-based framework for concept erasure in flow matching models. We reformulate concept erasure as a reward optimization problem and introduce a \textbf{dynamic dual-path reward mechanism} that jointly optimizes (i) a Concept Erasure (CE) reward to suppress target concepts and (ii) a Non-target Space (NS) reward to preserve generative fidelity. The two reward paths are adaptively balanced during training via a performance-driven switching strategy, enabling stable optimization without explicit supervision. Extensive experiments on nudity, object, and artistic style erasure demonstrate that our method achieves state-of-the-art erasure performance while maintaining strong image quality and semantic alignment. Moreover, it exhibits robust resistance to adversarial attacks and scales effectively to multi-concept scenarios. Our results establish a new paradigm for safe and controllable generation in flow matching models.

Problem

Research questions and friction points this paper is trying to address.

concept erasure

flow matching models

reward optimization

text-to-image generation

safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching

Concept Erasure

Reward Optimization