DreamBoothDPO: Improving Personalized Generation using Direct Preference Optimization

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of jointly optimizing concept fidelity and prompt alignment in personalized text-to-image generation. We propose the first unsupervised, direct preference optimization (DPO) framework tailored for diffusion models. Our method requires no human annotations: it automatically constructs synthetic preference pairs (high- vs. low-quality image pairs) using external quality metrics and achieves flexible trade-off control between fidelity and alignment via multi-step reinforcement fine-tuning. Key innovations include adapting the DPO paradigm to personalized diffusion models and designing a cross-architecture-compatible training pipeline. Experiments demonstrate significantly accelerated convergence and state-of-the-art performance across multiple baselines and model architectures. Both quantitative evaluation and qualitative analysis consistently confirm simultaneous improvements in concept faithfulness and prompt adherence.

Technology Category

Application Category

📝 Abstract
Personalized diffusion models have shown remarkable success in Text-to-Image (T2I) generation by enabling the injection of user-defined concepts into diverse contexts. However, balancing concept fidelity with contextual alignment remains a challenging open problem. In this work, we propose an RL-based approach that leverages the diverse outputs of T2I models to address this issue. Our method eliminates the need for human-annotated scores by generating a synthetic paired dataset for DPO-like training using external quality metrics. These better-worse pairs are specifically constructed to improve both concept fidelity and prompt adherence. Moreover, our approach supports flexible adjustment of the trade-off between image fidelity and textual alignment. Through multi-step training, our approach outperforms a naive baseline in convergence speed and output quality. We conduct extensive qualitative and quantitative analysis, demonstrating the effectiveness of our method across various architectures and fine-tuning techniques. The source code can be found at https://github.com/ControlGenAI/DreamBoothDPO.
Problem

Research questions and friction points this paper is trying to address.

Balancing concept fidelity with contextual alignment in personalized diffusion models
Eliminating human-annotated scores by using synthetic paired datasets for DPO-like training
Adjusting trade-off between image fidelity and textual alignment flexibly
Innovation

Methods, ideas, or system contributions that make the work stand out.

RL-based approach for T2I generation
Synthetic paired dataset for DPO training
Flexible fidelity-alignment trade-off adjustment
🔎 Similar Papers
No similar papers found.