DiffIER: Optimizing Diffusion Models with Iterative Error Reduction

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models achieve strong performance with classifier-free guidance (CFG), yet their generation quality is highly sensitive to the guidance scale, stemming from an inherent training-inference discrepancy. Method: This paper systematically identifies and quantifies this “training-inference gap” for the first time, proposing DiffIER—a plug-and-play inference-time framework that introduces iterative error minimization to dynamically correct the denoising trajectory without model retraining or architectural modification. Contribution/Results: DiffIER reformulates conditional generation as an error-suppression process. Evaluated on text-to-image synthesis, image super-resolution, and speech synthesis, it consistently improves both generation quality and stability over state-of-the-art CFG baselines while significantly alleviating guidance-scale sensitivity.

Technology Category

Application Category

📝 Abstract
Diffusion models have demonstrated remarkable capabilities in generating high-quality samples and enhancing performance across diverse domains through Classifier-Free Guidance (CFG). However, the quality of generated samples is highly sensitive to the selection of the guidance weight. In this work, we identify a critical ``training-inference gap'' and we argue that it is the presence of this gap that undermines the performance of conditional generation and renders outputs highly sensitive to the guidance weight. We quantify this gap by measuring the accumulated error during the inference stage and establish a correlation between the selection of guidance weight and minimizing this gap. Furthermore, to mitigate this gap, we propose DiffIER, an optimization-based method for high-quality generation. We demonstrate that the accumulated error can be effectively reduced by an iterative error minimization at each step during inference. By introducing this novel plug-and-play optimization framework, we enable the optimization of errors at every single inference step and enhance generation quality. Empirical results demonstrate that our proposed method outperforms baseline approaches in conditional generation tasks. Furthermore, the method achieves consistent success in text-to-image generation, image super-resolution, and text-to-speech generation, underscoring its versatility and potential for broad applications in future research.
Problem

Research questions and friction points this paper is trying to address.

Addresses training-inference gap in diffusion models
Reduces accumulated error during inference steps
Improves conditional generation quality across domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative error minimization during inference
Optimization framework plug-and-play enhancement
Reduces accumulated error each step
🔎 Similar Papers
No similar papers found.