SparseDM: Toward Sparse Efficient Diffusion Models

📅 2024-04-16
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational latency and memory overhead of diffusion models on resource-constrained devices, this paper proposes a structured sparsification method based on an improved Straight-Through Estimator (STE). Specifically, learnable layer-wise sparse masks are introduced into convolutional and linear layers of pre-trained UNet or Transformer backbones, followed by joint fine-tuning to enable efficient sparse inference. This work is the first to jointly optimize structured sparse mask design and end-to-end diffusion model fine-tuning. Under identical FID scores, the method achieves 50% reduction in MACs, 1.2× GPU inference speedup, and improves FID by approximately 1.0 over state-of-the-art sparse methods at equivalent computational cost. The approach ensures generation quality stability while enhancing hardware compatibility, establishing a novel paradigm for edge deployment of diffusion models.

Technology Category

Application Category

📝 Abstract
Diffusion models represent a powerful family of generative models widely used for image and video generation. However, the time-consuming deployment, long inference time, and requirements on large memory hinder their applications on resource constrained devices. In this paper, we propose a method based on the improved Straight-Through Estimator to improve the deployment efficiency of diffusion models. Specifically, we add sparse masks to the Convolution and Linear layers in a pre-trained diffusion model, then transfer learn the sparse model during the fine-tuning stage and turn on the sparse masks during inference. Experimental results on a Transformer and UNet-based diffusion models demonstrate that our method reduces MACs by 50% while maintaining FID. Sparse models are accelerated by approximately 1.2x on the GPU. Under other MACs conditions, the FID is also lower than 1 compared to other methods.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of diffusion models
Accelerating inference time for resource-constrained devices
Maintaining performance while decreasing memory requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse masks added to layers
Transfer learning during fine-tuning
Reduced MACs while maintaining FID
🔎 Similar Papers
No similar papers found.
Kafeng Wang
Kafeng Wang
Tsinghua University
Machine LearningDeep Learning
Jianfei Chen
Jianfei Chen
Associate Professor, Tsinghua University
Machine Learning
H
He Li
Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University.
Z
Zhenpeng Mi
Honor Device Co., Ltd.
J
Jun-Jie Zhu
Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University.