Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In diffusion models, U-Net parameters exhibit strong temporal (timestep) and sample dependence—early layers govern structural modeling, while later layers refine texture details; yet conventional full parameter sharing induces redundancy and interference. Method: We first observe that large parameter counts are not inherently critical, and propose MaskUNet: a dynamic sparse masking mechanism that adaptively zeros out non-critical parameters during denoising. It introduces minimal auxiliary parameters and supports both training-based and training-free fine-tuning strategies. Results: On COCO zero-shot generation, MaskUNet achieves state-of-the-art FID (12.3% improvement over baseline), significantly enhancing texture fidelity and detail quality, with robust generalization to downstream tasks. Our core contribution is the identification of timestep-sensitive parameter roles in diffusion processes and the establishment of the first lightweight, sample- and timestep-jointly adaptive U-Net sparsification paradigm.

Technology Category

Application Category

📝 Abstract
The diffusion models, in early stages focus on constructing basic image structures, while the refined details, including local features and textures, are generated in later stages. Thus the same network layers are forced to learn both structural and textural information simultaneously, significantly differing from the traditional deep learning architectures (e.g., ResNet or GANs) which captures or generates the image semantic information at different layers. This difference inspires us to explore the time-wise diffusion models. We initially investigate the key contributions of the U-Net parameters to the denoising process and identify that properly zeroing out certain parameters (including large parameters) contributes to denoising, substantially improving the generation quality on the fly. Capitalizing on this discovery, we propose a simple yet effective method-termed ``MaskUNet''- that enhances generation quality with negligible parameter numbers. Our method fully leverages timestep- and sample-dependent effective U-Net parameters. To optimize MaskUNet, we offer two fine-tuning strategies: a training-based approach and a training-free approach, including tailored networks and optimization functions. In zero-shot inference on the COCO dataset, MaskUNet achieves the best FID score and further demonstrates its effectiveness in downstream task evaluations. Project page: https://gudaochangsheng.github.io/MaskUnet-Page/
Problem

Research questions and friction points this paper is trying to address.

Enhances diffusion models by masking U-Net parameters
Improves image generation quality with minimal parameters
Offers training-based and training-free optimization strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MaskUNet to zero out ineffective parameters
Leverages timestep- and sample-dependent U-Net parameters
Offers training-based and training-free fine-tuning strategies
🔎 Similar Papers
No similar papers found.
L
Lei Wang
PCA Lab, VCIP, College of Computer Science, Nankai University
Senmao Li
Senmao Li
Ph.D Student, Nankai University
GANsImage-to-image translationDiffusion Models
F
Fei Yang
PCA Lab, VCIP, College of Computer Science, Nankai University
J
Jianye Wang
PCA Lab, VCIP, College of Computer Science, Nankai University
Z
Ziheng Zhang
PCA Lab, VCIP, College of Computer Science, Nankai University
Y
Yuhan Liu
PCA Lab, VCIP, College of Computer Science, Nankai University
Yaxing Wang
Yaxing Wang
Associate professor, Nankai University
Deep learningGANsImage-to-image translationTransfer learning
J
Jian Yang
PCA Lab, VCIP, College of Computer Science, Nankai University