SEE-DPO: Self Entropy Enhanced Direct Preference Optimization

📅 2024-11-06
🏛️ Trans. Mach. Learn. Res.
📈 Citations: 6
Influential: 1
📄 PDF
🤖 AI Summary
To address the overfitting and reward hacking vulnerabilities of DPO-based methods (e.g., SPO, Diffusion-DPO, D3PO) in aligning text-to-image diffusion models with human preferences, this paper proposes a stable alignment framework grounded in self-entropy regularization. The core innovation lies in the first incorporation of policy self-entropy—computed in the latent space—into the DPO objective, enabling joint modeling of preference structure and diffusion process gradients. This integration enhances exploration breadth and training robustness. Empirically, the method significantly mitigates training instability under out-of-distribution conditions, achieving state-of-the-art performance on key metrics including FID and CLIP-Score. Moreover, it improves both the diversity and fine-grained detail fidelity of generated images, demonstrating superior generalization and alignment stability compared to existing approaches.

Technology Category

Application Category

📝 Abstract
Direct Preference Optimization (DPO) has been successfully used to align large language models (LLMs) according to human preferences, and more recently it has also been applied to improving the quality of text-to-image diffusion models. However, DPO-based methods such as SPO, Diffusion-DPO, and D3PO are highly susceptible to overfitting and reward hacking, especially when the generative model is optimized to fit out-of-distribution during prolonged training. To overcome these challenges and stabilize the training of diffusion models, we introduce a self-entropy regularization mechanism in reinforcement learning from human feedback. This enhancement improves DPO training by encouraging broader exploration and greater robustness. Our regularization technique effectively mitigates reward hacking, leading to improved stability and enhanced image quality across the latent space. Extensive experiments demonstrate that integrating human feedback with self-entropy regularization can significantly boost image diversity and specificity, achieving state-of-the-art results on key image generation metrics.
Problem

Research questions and friction points this paper is trying to address.

Addressing overfitting and reward hacking in DPO-based diffusion models
Stabilizing diffusion model training through self-entropy regularization
Improving image diversity and quality in human preference alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-entropy regularization enhances DPO training
Mitigates reward hacking for improved stability
Boosts image diversity and specificity metrics
S
Shivanshu Shekhar
University of Illinois Urbana-Champaign
Shreyas Singh
Shreyas Singh
Indian Institute of Technology Madras
Computer VisionDeep LearningComputational Imaging
T
Tong Zhang
University of Illinois Urbana-Champaign