One Step Diffusion via Shortcut Models

📅 2024-10-16

🏛️ International Conference on Learning Representations

📈 Citations: 28

✨ Influential: 4

career value

204K/year

🤖 AI Summary

Diffusion and flow-matching models suffer from low generation efficiency and high computational cost due to iterative denoising over many steps; existing acceleration methods often rely on multi-stage training, auxiliary networks, or fragile scheduling strategies. To address this, we propose the *shortcut model*, an end-to-end, single-stage, single-network generative paradigm. It jointly models noise scale and target sampling step count via step-aware latent variable encoding, enabling flexible skip-step generation—including one-step sampling—without architectural or training overhead. Our method requires no knowledge distillation, network ensembling, or scheduler fine-tuning, and remains fully compatible with standard diffusion pipelines. Experiments demonstrate consistent superiority over consistency models, reflow, and other baselines across diverse step budgets, while significantly reducing training complexity and supporting dynamic, inference-time adjustment of sampling steps.

Technology Category

Application Category

📝 Abstract

Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process. Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous approaches, such as consistency models and reflow. Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.

Problem

Research questions and friction points this paper is trying to address.

Slow sampling in diffusion models due to iterative denoising

Complex training regimes needed for existing speed-up methods

Lack of single-network solutions for flexible step budgets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single network and training phase

Condition on noise level and step size

High-quality samples in few steps

🔎 Similar Papers

No similar papers found.