Fine-tuning Flow Matching Generative Models with Intermediate Feedback

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of credit assignment, critic instability, and model collapse in fine-tuning flow-matching generative models under intermediate feedback, this paper proposes AC-Flow—a novel actor-critic framework for continuous-time flow matching. Methodologically, it (1) employs reward shaping to enhance the reliability of intermediate signals; (2) introduces a dual-stability mechanism comprising advantage clipping and critic warm-up; and (3) proposes a Wasserstein-regularized critic weighting strategy to jointly optimize sample diversity and convergence. Integrated into Stable Diffusion 3, AC-Flow enables end-to-end online reinforcement learning fine-tuning. Experiments demonstrate state-of-the-art performance on text-to-image alignment, significantly improving generalization to unseen human preferences while preserving generation quality, diversity, and training stability.

Technology Category

Application Category

📝 Abstract
Flow-based generative models have shown remarkable success in text-to-image generation, yet fine-tuning them with intermediate feedback remains challenging, especially for continuous-time flow matching models. Most existing approaches solely learn from outcome rewards, struggling with the credit assignment problem. Alternative methods that attempt to learn a critic via direct regression on cumulative rewards often face training instabilities and model collapse in online settings. We present AC-Flow, a robust actor-critic framework that addresses these challenges through three key innovations: (1) reward shaping that provides well-normalized learning signals to enable stable intermediate value learning and gradient control, (2) a novel dual-stability mechanism that combines advantage clipping to prevent destructive policy updates with a warm-up phase that allows the critic to mature before influencing the actor, and (3) a scalable generalized critic weighting scheme that extends traditional reward-weighted methods while preserving model diversity through Wasserstein regularization. Through extensive experiments on Stable Diffusion 3, we demonstrate that AC-Flow achieves state-of-the-art performance in text-to-image alignment tasks and generalization to unseen human preference models. Our results demonstrate that even with a computationally efficient critic model, we can robustly finetune flow models without compromising generative quality, diversity, or stability.
Problem

Research questions and friction points this paper is trying to address.

Fine-tuning flow matching models with intermediate feedback remains challenging
Existing methods struggle with credit assignment and training instabilities
Addressing model collapse while preserving generative quality and diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward shaping enables stable intermediate value learning
Dual-stability mechanism prevents destructive policy updates
Scalable critic weighting preserves diversity via Wasserstein regularization
🔎 Similar Papers
No similar papers found.