AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the instability in forward-process optimization within flow-based reinforcement learning, particularly when negative advantage estimates induce non-convex losses. To mitigate this issue, the authors propose a novel forward-process reinforcement learning algorithm tailored for rectified flow models. The method introduces advantage-weighted least squares loss into the forward optimization of flow models for the first time and incorporates a local-reward-based target distribution regularization term to stabilize training through rollout policies. Experimental results demonstrate that the proposed approach significantly outperforms Flow-GRPO and other state-of-the-art forward-process RL baselines on image generation tasks.

📝 Abstract

We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes the reverse process, we optimize an advantage-weighted forward-process prediction loss. This optimization problem is unstable when advantages are negative and the loss becomes non-convex. We stabilize it by rollout policy regularization, which reduces variance and arises from fitting a local reward-improving target distribution. We evaluate AdvantageFlow on image generation tasks with Stable Diffusion 3.5 Medium. It outperforms both Flow-GRPO and a state-of-the-art forward-process RL baseline based on negative-aware fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

reinforcement learning

flow models

advantage-weighted loss

optimization instability

non-convexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

AdvantageFlow

rectified flow

forward-process reinforcement learning