Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Text-to-image diffusion models still face challenges in aligning with human preferences: existing diffusion-based direct preference optimization (DPO) methods, while widening the preference margin, concurrently amplify reconstruction errors for both preferred and dispreferred branches, degrading generation quality. To address this, we propose a protected gradient update mechanism. Through first-order analysis, we derive a closed-form scaling coefficient that adaptively suppresses gradients from dispreferred samples, ensuring the reconstruction error of preferred outputs does not increase during optimization. Our method is fully compatible with mainstream preference learning frameworks and incurs negligible computational overhead. Extensive experiments on multiple standard benchmarks demonstrate consistent and significant improvements over baselines across automatic preference scores, aesthetic quality metrics, and prompt alignment fidelity.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models deliver high-quality images, yet aligning them with human preferences remains challenging. We revisit diffusion-based Direct Preference Optimization (DPO) for these models and identify a critical pathology: enlarging the preference margin does not necessarily improve generation quality. In particular, the standard Diffusion-DPO objective can increase the reconstruction error of both winner and loser branches. Consequently, degradation of the less-preferred outputs can become sufficiently severe that the preferred branch is also adversely affected even as the margin grows. To address this, we introduce Diffusion-SDPO, a safeguarded update rule that preserves the winner by adaptively scaling the loser gradient according to its alignment with the winner gradient. A first-order analysis yields a closed-form scaling coefficient that guarantees the error of the preferred output is non-increasing at each optimization step. Our method is simple, model-agnostic, broadly compatible with existing DPO-style alignment frameworks and adds only marginal computational overhead. Across standard text-to-image benchmarks, Diffusion-SDPO delivers consistent gains over preference-learning baselines on automated preference, aesthetic, and prompt alignment metrics. Code is publicly available at https://github.com/AIDC-AI/Diffusion-SDPO.

Problem

Research questions and friction points this paper is trying to address.

Optimizing human preference alignment in text-to-image diffusion models

Addressing reconstruction quality degradation during preference optimization

Developing safeguarded updates to prevent adverse effects on preferred outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Safeguarded update rule adaptively scales loser gradient

Closed-form scaling coefficient ensures non-increasing winner error

Model-agnostic method with minimal computational overhead added

🔎 Similar Papers

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization