Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

πŸ“… 2024-10-04
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 2
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
Diffusion models distilled for few-step inference suffer from poor fine-tuning compatibility, often yielding blurry outputs and degrading few-step generation capability. To address this, we propose Pairwise Sample Optimization (PSO), a lightweight fine-tuning framework that eliminates the need for re-distillation. PSO leverages self-sampling to generate reference images, constructs target-reference image pairs, and optimizes a likelihood-ratio objective via online/offline collaborative sampling, augmented with preference modeling for distribution alignment. Notably, PSO is the first method to introduce pairwise likelihood optimization into distilled diffusion model fine-tuning. It significantly improves generation quality and fidelity across style transfer, concept customization, and human preference alignment tasksβ€”while strictly preserving single-step and few-step inference performance. Crucially, PSO incurs less than 10% of the training cost required for full re-distillation.

Technology Category

Application Category

πŸ“ Abstract
Recent advancements in timestep-distilled diffusion models have enabled high-quality image generation that rivals non-distilled multi-step models, but with significantly fewer inference steps. While such models are attractive for applications due to the low inference cost and latency, fine-tuning them with a naive diffusion objective would result in degraded and blurry outputs. An intuitive alternative is to repeat the diffusion distillation process with a fine-tuned teacher model, which produces good results but is cumbersome and computationally intensive; the distillation training usually requires magnitude higher of training compute compared to fine-tuning for specific image styles. In this paper, we present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model. PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images. This enables the model to retain its few-step generation ability, while allowing for fine-tuning of its output distribution. We also demonstrate that PSO is a generalized formulation which can be flexibly extended to both offline-sampled and online-sampled pairwise data, covering various popular objectives for diffusion model preference optimization. We evaluate PSO in both preference optimization and other fine-tuning tasks, including style transfer and concept customization. We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data. PSO also demonstrates effectiveness in style transfer and concept customization by directly tuning timestep-distilled diffusion models.
Problem

Research questions and friction points this paper is trying to address.

Fine-tuning timestep-distilled diffusion models without quality degradation
Reducing computational cost of fine-tuning for specific image styles
Enabling direct adaptation to human-preferred image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pairwise sample optimization for model fine-tuning
Retains few-step generation ability during tuning
Supports offline and online pairwise data sampling
πŸ”Ž Similar Papers
No similar papers found.