🤖 AI Summary
Diffusion models suffer from slow inference due to high computational cost in iterative sampling.
Method: We propose a training-free acceleration framework by reformulating the classifier-free guidance (CFG)-induced ordinary differential equation as a multi-rate system, decoupling noise estimation and guidance integration—marking the first application of multi-rate integration to diffusion sampling. Leveraging theoretical error analysis revealing the robustness and redundancy of the guidance branch, we design adaptive step sizing, dynamic guidance scale scheduling, and coarse-fine dual-grid integration.
Results: Experiments demonstrate up to 30% reduction in function evaluations (NFE) while preserving near-lossless generation quality (ΔImageReward ≤ 0.032), outperforming existing acceleration methods and enabling real-time high-fidelity image synthesis.
📝 Abstract
In this paper, we propose Tortoise and Hare Guidance (THG), a training-free strategy that accelerates diffusion sampling while maintaining high-fidelity generation. We demonstrate that the noise estimate and the additional guidance term exhibit markedly different sensitivity to numerical error by reformulating the classifier-free guidance (CFG) ODE as a multirate system of ODEs. Our error-bound analysis shows that the additional guidance branch is more robust to approximation, revealing substantial redundancy that conventional solvers fail to exploit. Building on this insight, THG significantly reduces the computation of the additional guidance: the noise estimate is integrated with the tortoise equation on the original, fine-grained timestep grid, while the additional guidance is integrated with the hare equation only on a coarse grid. We also introduce (i) an error-bound-aware timestep sampler that adaptively selects step sizes and (ii) a guidance-scale scheduler that stabilizes large extrapolation spans. THG reduces the number of function evaluations (NFE) by up to 30% with virtually no loss in generation fidelity ($Delta$ImageReward $leq$ 0.032) and outperforms state-of-the-art CFG-based training-free accelerators under identical computation budgets. Our findings highlight the potential of multirate formulations for diffusion solvers, paving the way for real-time high-quality image synthesis without any model retraining. The source code is available at https://github.com/yhlee-add/THG.