Glance: Accelerating Diffusion Models with 1 Sample

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models suffer from high computational cost and prolonged inference latency, while existing acceleration methods typically require retraining student models—compromising generalization and incurring substantial overhead. To address this, we propose a phase-aware LoRA-based acceleration framework that operates without retraining the backbone model. Our approach introduces a dual-expert architecture comprising Slow-LoRA (optimized for semantic generation) and Fast-LoRA (specialized for detail refinement), coupled with phase-aware knowledge distillation to enable adaptive denoising step compression. Adapter training requires only one sample and completes within one hour on a single NVIDIA V100 GPU. Extensive evaluation across multiple benchmarks demonstrates up to 5× inference speedup, with preserved FID scores and visual fidelity. The method significantly enhances deployment efficiency and cross-dataset generalization, offering a lightweight, plug-and-play solution for accelerating diffusion model inference.

Technology Category

Application Category

📝 Abstract
Diffusion models have achieved remarkable success in image generation, yet their deployment remains constrained by the heavy computational cost and the need for numerous inference steps. Previous efforts on fewer-step distillation attempt to skip redundant steps by training compact student models, yet they often suffer from heavy retraining costs and degraded generalization. In this work, we take a different perspective: we accelerate smartly, not evenly, applying smaller speedups to early semantic stages and larger ones to later redundant phases. We instantiate this phase-aware strategy with two experts that specialize in slow and fast denoising phases. Surprisingly, instead of investing massive effort in retraining student models, we find that simply equipping the base model with lightweight LoRA adapters achieves both efficient acceleration and strong generalization. We refer to these two adapters as Slow-LoRA and Fast-LoRA. Through extensive experiments, our method achieves up to 5 acceleration over the base model while maintaining comparable visual quality across diverse benchmarks. Remarkably, the LoRA experts are trained with only 1 samples on a single V100 within one hour, yet the resulting models generalize strongly on unseen prompts.
Problem

Research questions and friction points this paper is trying to address.

Accelerate diffusion models by reducing computational costs
Maintain visual quality while speeding up inference steps
Train lightweight adapters for efficient generalization with minimal data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Phase-aware acceleration with specialized LoRA adapters
Lightweight LoRA adapters trained with minimal samples
Separate Slow-LoRA and Fast-LoRA for different denoising phases
🔎 Similar Papers
No similar papers found.
Z
Zhuobai Dong
WHU
R
Rui Zhao
NUS
S
Songjie Wu
CSU
J
Junchao Yi
UESTC
Linjie Li
Linjie Li
Microsoft
Vision and Language
Zhengyuan Yang
Zhengyuan Yang
Principal Researcher, Microsoft
Computer VisionMultimediaMultimodalPost-TrainingAgentic RL
L
Lijuan Wang
Microsoft
A
Alex Jinpeng Wang
CSU