Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the challenges of insufficient image quality and prompt alignment in post-training text-to-image diffusion models, as well as the high variance in reinforcement learning updates. To this end, the authors propose an online reinforcement learning method that models the entire diffusion sampling process as a single policy action. By sampling paired trajectories and leveraging finite difference principles to guide vector field updates, the approach effectively reduces the variance of policy gradient estimates. Built upon flow-matching diffusion models, the method constructs reward signals using a vision-language model and off-the-shelf image quality metrics. Experimental results demonstrate that the proposed approach converges faster than existing methods and achieves significant improvements in both image fidelity and prompt alignment.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has become a standard technique for post-training diffusion-based image synthesis models, as it enables learning from reward signals to explicitly improve desirable aspects such as image quality and prompt alignment. In this paper, we propose an online RL variant that reduces the variance in the model updates by sampling paired trajectories and pulling the flow velocity in the direction of the more favorable image. Unlike existing methods that treat each sampling step as a separate policy action, we consider the entire sampling process as a single action. We experiment with both high-quality vision language models and off-the-shelf quality metrics for rewards, and evaluate the outputs using a broad set of metrics. Our method converges faster and yields higher output quality and prompt alignment than previous approaches.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Text-to-Image Generation

Diffusion Models

Post-Training

Prompt Alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning

Diffusion Models

Flow Optimization