AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the instability in forward-process optimization within flow-based reinforcement learning, particularly when negative advantage estimates induce non-convex losses. To mitigate this issue, the authors propose a novel forward-process reinforcement learning algorithm tailored for rectified flow models. The method introduces advantage-weighted least squares loss into the forward optimization of flow models for the first time and incorporates a local-reward-based target distribution regularization term to stabilize training through rollout policies. Experimental results demonstrate that the proposed approach significantly outperforms Flow-GRPO and other state-of-the-art forward-process RL baselines on image generation tasks.
📝 Abstract
We introduce AdvantageFlow, a forward-process reinforcement learning algorithm for rectified flow models. Unlike Flow-GRPO, which optimizes the reverse process, we optimize an advantage-weighted forward-process prediction loss. This optimization problem is unstable when advantages are negative and the loss becomes non-convex. We stabilize it by rollout policy regularization, which reduces variance and arises from fitting a local reward-improving target distribution. We evaluate AdvantageFlow on image generation tasks with Stable Diffusion 3.5 Medium. It outperforms both Flow-GRPO and a state-of-the-art forward-process RL baseline based on negative-aware fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
flow models
advantage-weighted loss
optimization instability
non-convexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

AdvantageFlow
rectified flow
forward-process reinforcement learning
advantage-weighted loss
policy regularization
🔎 Similar Papers
No similar papers found.