Towards Better Optimization For Listwise Preference in Diffusion Models

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion models for preference alignment predominantly rely on pairwise comparison (DPO), failing to fully exploit fine-grained ranking information embedded in human feedback. To address this, we propose Diffusion-LPO—the first listwise preference optimization framework tailored for diffusion models. It integrates the Plackett–Luce ranking model into the DPO objective to explicitly capture latent relative rankings among multiple generated images. Unlike reward modeling or reinforcement learning–based approaches, Diffusion-LPO operates without explicit reward estimation or policy-gradient loops, enabling efficient list-level alignment via a single gradient update step. We evaluate Diffusion-LPO across text-to-image generation, image editing, and personalized alignment tasks. Results demonstrate consistent and significant improvements over pairwise DPO baselines in both perceptual quality and human preference consistency, validating that listwise preference modeling delivers critical gains in aligning diffusion models with human intent.

Technology Category

Application Category

📝 Abstract
Reinforcement learning from human feedback (RLHF) has proven effectiveness for aligning text-to-image (T2I) diffusion models with human preferences. Although Direct Preference Optimization (DPO) is widely adopted for its computational efficiency and avoidance of explicit reward modeling, its applications to diffusion models have primarily relied on pairwise preferences. The precise optimization of listwise preferences remains largely unaddressed. In practice, human feedback on image preferences often contains implicit ranked information, which conveys more precise human preferences than pairwise comparisons. In this work, we propose Diffusion-LPO, a simple and effective framework for Listwise Preference Optimization in diffusion models with listwise data. Given a caption, we aggregate user feedback into a ranked list of images and derive a listwise extension of the DPO objective under the Plackett-Luce model. Diffusion-LPO enforces consistency across the entire ranking by encouraging each sample to be preferred over all of its lower-ranked alternatives. We empirically demonstrate the effectiveness of Diffusion-LPO across various tasks, including text-to-image generation, image editing, and personalized preference alignment. Diffusion-LPO consistently outperforms pairwise DPO baselines on visual quality and preference alignment.
Problem

Research questions and friction points this paper is trying to address.

Optimizing listwise preferences in diffusion models
Addressing ranked human feedback for image generation
Improving alignment with precise human preference data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Listwise preference optimization for diffusion models
Extends DPO objective using Plackett-Luce model
Enforces ranking consistency across all image alternatives
🔎 Similar Papers
No similar papers found.
J
Jiamu Bai
Penn State University
X
Xin Yu
Penn State University
Meilong Xu
Meilong Xu
Stony Brook University
Machine LearningComputer VisionTopological Data Analysis
W
Weitao Lu
TikTok Inc.
X
Xin Pan
TikTok Inc.
Kiwan Maeng
Kiwan Maeng
Pennsylvania State University
Privacy-preserving MLsystems for MLcompilersembedded systems
Daniel Kifer
Daniel Kifer
Penn State University
privacymachine learning
J
Jian Wang
TikTok Inc.
Y
Yu Wang
TikTok Inc.