Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a unified framework for aligning diffusion and flow-matching models with reward signals, circumventing the high computational cost and limited generalization of conventional alignment methods that rely on full fine-tuning of pretrained generative models. For the first time, the approach adopts a reward-weighted distribution perspective: diffusion models are aligned via score guidance, while flow-matching models employ velocity guidance, both leveraging conditional expectation to estimate reward signals without requiring end-to-end fine-tuning. The method enables single-step generation in diffusion models, reducing computational overhead by 60% while matching the performance of fine-tuned baselines. In flow-matching models, it significantly enhances generation quality without any additional training.

Technology Category

Application Category

📝 Abstract
Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generative models to maximize a given reward function, these approaches require extensive computational resources and may not generalize well across different objectives. In this work, we propose a novel alignment framework by leveraging the underlying nature of the alignment problem -- sampling from reward-weighted distributions -- and show that it applies to both diffusion models (via score guidance) and flow matching models (via velocity guidance). The score function (velocity field) required for the reward-weighted distribution can be decomposed into the pre-trained score (velocity field) plus a conditional expectation of the reward. For the alignment on the diffusion model, we identify a fundamental challenge: the adversarial nature of the guidance term can introduce undesirable artifacts in the generated images. Therefore, we propose a finetuning-free framework that trains a guidance network to estimate the conditional expectation of the reward. We achieve comparable performance to finetuning-based models with one-step generation with at least a 60% reduction in computational cost. For the alignment on flow matching, we propose a training-free framework that improves the generation quality without additional computational cost.
Problem

Research questions and friction points this paper is trying to address.

alignment
text-to-image generation
diffusion models
flow matching
reward-weighted distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

reward-weighted sampling
score guidance
velocity guidance
finetuning-free alignment
flow matching
🔎 Similar Papers
No similar papers found.
Y
Yidong Ouyang
Department of Statistics, University of California, Los Angeles
Liyan Xie
Liyan Xie
Assistant Professor, University of Minnesota
Statistical machine learningonline change detectiondiffusion models
Hongyuan Zha
Hongyuan Zha
The Chinese University of Hong Kong, Shenzhen
machine learning
G
Guang Cheng
Department of Statistics, University of California, Los Angeles