ADT: Tuning Diffusion Models with Adversarial Supervision

๐Ÿ“… 2025-04-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Diffusion model training and inference suffer from distribution misalignment due to prediction bias and error accumulation along the sampling trajectory. To address this, we propose Adversarial Diffusion Tuning (ADT), an adversarial fine-tuning framework that introduces end-to-end adversarial supervision directly into the inference pathโ€”enabling, for the first time, adversarial alignment of the diffusion sampling process. Our method features: (1) a twin discriminator with frozen backbone to enhance training stability; (2) an image-to-image sampling strategy to improve generation fidelity; and (3) gradient-constrained backward flow coupled with multi-stage losses, jointly optimizing distribution alignment while preserving the original diffusion objective. Evaluated on Stable Diffusion v1.5, XL, and v3, ADT achieves an average 12.7% reduction in FID. Qualitative assessment confirms substantial improvements in visual quality and structural detail consistency.

Technology Category

Application Category

๐Ÿ“ Abstract
Diffusion models have achieved outstanding image generation by reversing a forward noising process to approximate true data distributions. During training, these models predict diffusion scores from noised versions of true samples in a single forward pass, while inference requires iterative denoising starting from white noise. This training-inference divergences hinder the alignment between inference and training data distributions, due to potential prediction biases and cumulative error accumulation. To address this problem, we propose an intuitive but effective fine-tuning framework, called Adversarial Diffusion Tuning (ADT), by stimulating the inference process during optimization and aligning the final outputs with training data by adversarial supervision. Specifically, to achieve robust adversarial training, ADT features a siamese-network discriminator with a fixed pre-trained backbone and lightweight trainable parameters, incorporates an image-to-image sampling strategy to smooth discriminative difficulties, and preserves the original diffusion loss to prevent discriminator hacking. In addition, we carefully constrain the backward-flowing path for back-propagating gradients along the inference path without incurring memory overload or gradient explosion. Finally, extensive experiments on Stable Diffusion models (v1.5, XL, and v3), demonstrate that ADT significantly improves both distribution alignment and image quality.
Problem

Research questions and friction points this paper is trying to address.

Aligns diffusion model inference with training data distributions
Reduces prediction biases and cumulative error in diffusion models
Improves image quality via adversarial supervision and fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial supervision aligns diffusion model outputs
Siamese-network discriminator enhances adversarial training robustness
Constrained backward-flow prevents memory and gradient issues
๐Ÿ”Ž Similar Papers
No similar papers found.