🤖 AI Summary
Diffusion models suffer from slow inference, hindering interactive applications. This paper introduces “Diffusion Preview,” a novel paradigm wherein only a minimal number of steps generate an assessable, high-fidelity preview image; final refinement proceeds only upon user confirmation. Our core contribution is ConsistencySolver—the first lightweight, trainable, high-order ODE solver based on linear multistep methods—jointly optimized via reinforcement learning to maximize both preview quality and consistency between preview and final output. Crucially, our method requires no model retraining or knowledge distillation and is compatible with any pre-trained diffusion model. Experiments demonstrate that, compared to Multistep DPM-Solver, our approach reduces sampling steps by 47% while preserving FID performance; it significantly outperforms distillation-based baselines; and it cuts total user-side interaction time by nearly 50%, without compromising final output quality.
📝 Abstract
The slow inference process of image diffusion models significantly degrades interactive user experiences. To address this, we introduce Diffusion Preview, a novel paradigm employing rapid, low-step sampling to generate preliminary outputs for user evaluation, deferring full-step refinement until the preview is deemed satisfactory. Existing acceleration methods, including training-free solvers and post-training distillation, struggle to deliver high-quality previews or ensure consistency between previews and final outputs. We propose ConsistencySolver derived from general linear multistep methods, a lightweight, trainable high-order solver optimized via Reinforcement Learning, that enhances preview quality and consistency. Experimental results demonstrate that ConsistencySolver significantly improves generation quality and consistency in low-step scenarios, making it ideal for efficient preview-and-refine workflows. Notably, it achieves FID scores on-par with Multistep DPM-Solver using 47% fewer steps, while outperforming distillation baselines. Furthermore, user studies indicate our approach reduces overall user interaction time by nearly 50% while maintaining generation quality. Code is available at https://github.com/G-U-N/consolver.