STORK: Improving the Fidelity of Mid-NFE Sampling for Diffusion and Flow Matching Models

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Diffusion models commonly suffer from insufficient fidelity when sampling with moderate numbers of function evaluations (NFEs, 20–50), and existing acceleration methods either target extremely low NFEs (<10) or rely on model-specific architectural assumptions, compromising generality and quality. To address this, we propose STORK—a training-free, architecture-agnostic ODE solver that for the first time integrates stiffness-aware ODE solving principles with adaptive Taylor expansion to construct a stable, orthogonal Runge–Kutta scheme. STORK is agnostic to model parametrization and unifies support for both noise-prediction and flow-matching paradigms without assuming semi-linear structure. Evaluated on state-of-the-art models—including Stable Diffusion 3.5, SANA, and FLUX—STORK achieves substantial FID reduction and improved image fidelity across the 20–50 NFE regime. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Diffusion models (DMs) have demonstrated remarkable performance in high-fidelity image and video generation. Because high-quality generations with DMs typically require a large number of function evaluations (NFEs), resulting in slow sampling, there has been extensive research successfully reducing the NFE to a small range (<10) while maintaining acceptable image quality. However, many practical applications, such as those involving Stable Diffusion 3.5, FLUX, and SANA, commonly operate in the mid-NFE regime (20-50 NFE) to achieve superior results, and, despite the practical relevance, research on the effective sampling within this mid-NFE regime remains underexplored. In this work, we propose a novel, training-free, and structure-independent DM ODE solver called the Stabilized Taylor Orthogonal Runge--Kutta (STORK) method, based on a class of stiff ODE solvers with a Taylor expansion adaptation. Unlike prior work such as DPM-Solver, which is dependent on the semi-linear structure of the DM ODE, STORK is applicable to any DM sampling, including noise-based and flow matching-based models. Within the 20-50 NFE range, STORK achieves improved generation quality, as measured by FID scores, across unconditional pixel-level generation and conditional latent-space generation tasks using models like Stable Diffusion 3.5 and SANA. Code is available at https://github.com/ZT220501/STORK.

Problem

Research questions and friction points this paper is trying to address.

Improving sampling fidelity in mid-NFE regime (20-50 steps)

Enhancing image quality for diffusion and flow matching models

Developing training-free, structure-independent ODE solver (STORK)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free ODE solver for DMs

Stabilized Taylor Orthogonal Runge-Kutta method

Applicable to any DM sampling

🔎 Similar Papers

Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling