🤖 AI Summary
Existing diffusion models trained on unfiltered large-scale datasets often generate outputs misaligned with human preferences, primarily because conventional fine-tuning methods neglect modeling of unconditional and negative-conditioned outputs, thereby undermining the effectiveness of Classifier-Free Guidance (CFG).
Method: We propose Diffusion-Negative Preference Optimization (Diffusion-NPO), the first systematic framework to explicitly model negative preferences. It enables lightweight fine-tuning—without requiring new data or changes to training paradigms—to jointly optimize both unconditional and negative-conditioned output distributions.
Contribution/Results: Diffusion-NPO is compatible with SD1.5, SDXL, and video diffusion models. It consistently improves human preference scores across multiple benchmarks (+4.2%–9.7%), significantly enhances CFG guidance strength and robustness, and delivers uniform gains on images, videos, and already-optimized models. This work establishes an efficient, general-purpose fine-tuning paradigm for preference alignment in diffusion models.
📝 Abstract
Diffusion models have made substantial advances in image generation, yet models trained on large, unfiltered datasets often yield outputs misaligned with human preferences. Numerous methods have been proposed to fine-tune pre-trained diffusion models, achieving notable improvements in aligning generated outputs with human preferences. However, we argue that existing preference alignment methods neglect the critical role of handling unconditional/negative-conditional outputs, leading to a diminished capacity to avoid generating undesirable outcomes. This oversight limits the efficacy of classifier-free guidance~(CFG), which relies on the contrast between conditional generation and unconditional/negative-conditional generation to optimize output quality. In response, we propose a straightforward but versatile effective approach that involves training a model specifically attuned to negative preferences. This method does not require new training strategies or datasets but rather involves minor modifications to existing techniques. Our approach integrates seamlessly with models such as SD1.5, SDXL, video diffusion models and models that have undergone preference optimization, consistently enhancing their alignment with human preferences.