Divergence Minimization Preference Optimization for Diffusion Model Alignment

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Diffusion models often suffer from misalignment with human preferences and a tendency to converge to suboptimal mean-seeking behaviors. To address this, we propose DMPO (Diffusion Model Preference Optimization), a theoretically grounded framework that formulates preference learning as a policy optimization problem via inverse KL divergence minimization—thereby ensuring rigorous theoretical justification and inherently avoiding mode collapse. DMPO jointly optimizes generative quality and preference alignment by integrating both human feedback and automated evaluation metrics during training. Extensive experiments demonstrate that DMPO consistently outperforms all baselines on PickScore, achieving an average improvement of ≥64.6%. It also establishes new state-of-the-art performance across multiple benchmark datasets, significantly enhancing both image fidelity and alignment with human preferences.

Technology Category

Application Category

📝 Abstract

Diffusion models have achieved remarkable success in generating realistic and versatile images from text prompts. Inspired by the recent advancements of language models, there is an increasing interest in further improving the models by aligning with human preferences. However, we investigate alignment from a divergence minimization perspective and reveal that existing preference optimization methods are typically trapped in suboptimal mean-seeking optimization. In this paper, we introduce Divergence Minimization Preference Optimization (DMPO), a novel and principled method for aligning diffusion models by minimizing reverse KL divergence, which asymptotically enjoys the same optimization direction as original RL. We provide rigorous analysis to justify the effectiveness of DMPO and conduct comprehensive experiments to validate its empirical strength across both human evaluations and automatic metrics. Our extensive results show that diffusion models fine-tuned with DMPO can consistently outperform or match existing techniques, specifically outperforming all existing diffusion alignment baselines by at least 64.6% in PickScore across all evaluation datasets, demonstrating the method's superiority in aligning generative behavior with desired outputs. Overall, DMPO unlocks a robust and elegant pathway for preference alignment, bridging principled theory with practical performance in diffusion models.

Problem

Research questions and friction points this paper is trying to address.

Aligning diffusion models with human preferences effectively

Overcoming suboptimal mean-seeking in preference optimization methods

Minimizing reverse KL divergence for better generative behavior alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimizes reverse KL divergence for alignment

Asymptotically matches original RL optimization

Outperforms existing baselines by 64.6%

🔎 Similar Papers

Training-free Diffusion Model Alignment with Sampling Demons