Fine-Tuning Diffusion Generative Models via Rich Preference Optimization

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Preference-based fine-tuning of text-to-image diffusion models suffers from low-quality preferences, sparse and uninformative feedback signals, poor interpretability, reward hacking, and overfitting—largely due to reliance on opaque, black-box reward models. Method: We propose a preference optimization framework grounded in multidimensional, human-interpretable feedback. Leveraging large language models, we automatically generate critical image critiques and convert them into executable editing instructions (e.g., for ControlNet or inpainting), thereby synthesizing high-information, fine-grained preference pairs aligned with human intent. Crucially, we bypass explicit reward modeling and instead integrate a DPO variant directly into the diffusion model for preference learning. Results: Evaluated on SDXL and other mainstream diffusion models, our approach significantly improves image fidelity, text-image alignment, and controllability. Preference pair efficacy increases by 42% over Diffusion-DPO, demonstrating superior generalization and interpretability.

Technology Category

Application Category

📝 Abstract
We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals to improve the curation of preference pairs for fine-tuning text-to-image diffusion models. Traditional methods, like Diffusion-DPO, often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. In contrast, our approach begins with generating detailed critiques of synthesized images to extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models.
Problem

Research questions and friction points this paper is trying to address.

Improves text-to-image diffusion models via rich feedback signals.
Addresses limitations of traditional reward model labeling methods.
Generates synthetic preference pairs for enhanced model fine-tuning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages rich feedback for preference pair curation
Generates detailed critiques for image editing instructions
Creates synthetic, informative preference pairs for tuning
🔎 Similar Papers
No similar papers found.