🤖 AI Summary
Text-to-image diffusion models still face challenges in aligning with human preferences: existing diffusion-based direct preference optimization (DPO) methods, while widening the preference margin, concurrently amplify reconstruction errors for both preferred and dispreferred branches, degrading generation quality. To address this, we propose a protected gradient update mechanism. Through first-order analysis, we derive a closed-form scaling coefficient that adaptively suppresses gradients from dispreferred samples, ensuring the reconstruction error of preferred outputs does not increase during optimization. Our method is fully compatible with mainstream preference learning frameworks and incurs negligible computational overhead. Extensive experiments on multiple standard benchmarks demonstrate consistent and significant improvements over baselines across automatic preference scores, aesthetic quality metrics, and prompt alignment fidelity.
📝 Abstract
Text-to-image diffusion models deliver high-quality images, yet aligning them with human preferences remains challenging. We revisit diffusion-based Direct Preference Optimization (DPO) for these models and identify a critical pathology: enlarging the preference margin does not necessarily improve generation quality. In particular, the standard Diffusion-DPO objective can increase the reconstruction error of both winner and loser branches. Consequently, degradation of the less-preferred outputs can become sufficiently severe that the preferred branch is also adversely affected even as the margin grows. To address this, we introduce Diffusion-SDPO, a safeguarded update rule that preserves the winner by adaptively scaling the loser gradient according to its alignment with the winner gradient. A first-order analysis yields a closed-form scaling coefficient that guarantees the error of the preferred output is non-increasing at each optimization step. Our method is simple, model-agnostic, broadly compatible with existing DPO-style alignment frameworks and adds only marginal computational overhead. Across standard text-to-image benchmarks, Diffusion-SDPO delivers consistent gains over preference-learning baselines on automated preference, aesthetic, and prompt alignment metrics. Code is publicly available at https://github.com/AIDC-AI/Diffusion-SDPO.